This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Edited by Christophe Choquet, Vanda Luengo, and Kalina Yacef
USAGE ANALYSIS IN LEARNING SYSTEMS Edited by Christophe Choquet, Vanda Luengo, and Kalina Yacef
Association for the Advancement of Computing in Education www.aace.org
EDUCATION & INFORMATION TECHNOLOGY DIGITAL LIBRARY he EdITLib is your source for 15+ years (with your 1 year T subscription) of peer-reviewed, published articles (20,000+) and papers on the latest research, developments, and applications related to all aspects of Educational Technology and E-Learning.
• Conduct scholarly research • Receive Table of Contents alerts • Keep current on the latest research and publications in your field
Journals – www.aace.org/pubs • International Journal on E-Learning (Corporate, Government, Healthcare, & Higher Education) • Journal of Computers in Mathematics and Science Teaching • Journal of Educational Multimedia and Hypermedia • Journal of Interactive Learning Research • Journal of Technology and Teacher Education • AACE Journal (electronic) • Contemporary Issues in Technology & Teacher Education (electronic)
Conference Papers/Presentations – www.aace.org/conf ED-MEDIA – World Conference on Educational Multimedia, Hypermedia & Telecommunications E-Learn – World Conference on E-Learning in Corporate, Healthcare, Government, and Higher Education SITE – Society for Information Technology and Teacher Education International Conference and others...
Usage Analysis in Learning Systems CHRISTOPHE CHOQUET LIUM/IUT de Laval, France [email protected]
VANDA LUENGO LIG/University Joseph Fourier, Grenoble, France [email protected] KALINA YACEF University of Sydney, Australia [email protected]
This book explores different approaches for analyzing student data collected electronically in computer-based learning activities in order to support teaching and learning. The articles selected for this special issue are extended versions of the papers presented at the workshop on Usage Analysis in Learning Systems, organized in conjunction with the 12th International Conference on Artificial Intelligence (Amsterdam, The Netherlands, July 2005). Learning systems track student usage in order to dynamically adapt some aspects of the contents, format and order of the learning material to the individual student. These large amounts of student data can also offer material for further analysis using statistical and data mining techniques. In this emerging field of research, approaches for such analysis vary greatly depending on the targeted user (e.g., teacher, learner, instructional designer, education researcher, parent, software engineer) and what the analysis is intended for. The articles presented in this special issue cover various stages of the data’s "cycle of life:" Usage tracking modeling: Choquet and Iksal recommend that the software development process should explicitly integrate a usage analysis phase, as it can provide designers with significant information on how their systems are used for reengineering purposes. They present a generic usage tracking language (UTL) and describe an instantiation with IMS-Learning Design. Usage data analysis: Scheuer, Mühlenbrock and Melis propose a system,
8
Choquet, Luengo, and Yacef
SIAM, (System for Interaction Analysis by Machine learning), working towards an automatic analysis of interactive data based on standard database server and machine learning techniques. Feng and Heffernan present their web-based system (ASSISTment), which was used to support teachers in middle school mathematics classes and which provides a range of reports providing real time reporting to teachers in their classroom. Usage data visualization: Avouris, Fiotakis, Kahrimanis, Margaritis and Komis propose a software environment, Collaborative Analysis Tool, (ColAT) that supports inter-relation of resources in order to analyse the collected evidence and produce interpretative views of the activity. Mazza and Botturi present GISMO, an open source, graphic student-tracking tool integrated into Moodle. GISMO provides visualizations of behavioral, cognitive and social data from the course, allowing constant monitoring of students’ activities, engagement and learning outcomes. Marty, Heraud, Carron and France observe learners’ behavior within a web-based learning environment to understand it. The observations are gathered from various sources and form a trace, which is finally visualized. Usability of data: Barre, Choquet, and El-Kechaï propose design patterns for recording and analyzing usage in learning systems taking into account multiple perspectives – the role of the user, the purpose of the data analysis, and the type of data. Delozanne, Le Calvez, Merceron, and Labat present design patterns which objective is to provide support for designers to track and analyze the use of a learning system by its different actors. We hope this special issue will stimulate ideas to grow this exciting field of research! We would like to thank all the reviewers of this special issue for their constructive feedback. Christophe Choquet, Vanda Luengo and Kalina Yacef
Usage Analysis in Learning Systems, 9-32
Modeling Tracks for the Model Driven Re-engineering of a TEL System CHRISTOPHE CHOQUET LIUM/IUT de Laval, France [email protected] SEBASTIEN IKSAL LIUM/IUT de Laval, France Sé[email protected]
In the context of distance learning and teaching, the re-engineering process needs a feedback on the learners' usage of the learning system. The feedback is given by numerous vectors, such as interviews, questionnaires, videos or log files. We consider that it is important to interpret tracks in order to compare the designer’s intentions with the learners’ activities during a session. In this article, we present the usage tracking language (UTL). This language was designed to be generic and we present an instantiation of a part of it with IMS-Learning Design, the representation model we chose for our three years of experiments. At the end of the article, we describe several use cases of this language, based on our experimentations.
Introduction Nowadays, numerous interactive systems are available on the Web. Most of these systems need some kind of feedback on the usage in order to improve them. In the specific context of distance learning and teaching, the desynchronization between teachers’ two major roles – instructional designer and tutor – brings about a lack of uses feedback. The software development process should explicitly integrate a usage analysis phase, which can provide designers with significant information on their systems’ uses for a re-engineering purpose (Corbière & Choquet, 2004a). Semantic Web aims at facilitating data management on the Web. It brings languages, standards and corresponding tools that make the sharing and building of automatic and semi-automatic programs easier (Berners-Lee, 1999). Automatic usage
10
Choquet and Iksal
analysis is often made by mathematicians or computer engineers. In order to facilitate the appropriation, the comprehension and the interpretation of results by instructional designers, who are the main actors of an e-learning system development process, we think they should be fully integrated in this analysis phase. The research contribution we present in this article is fully in line with our approach to the engineering and re-engineering of e-learning systems, where we particularly stress the need for a formal description of the design view, in terms of scenarios and learning resources, to help the analysis of observed uses (i.e., descriptive scenarios) and to compare them with the designer's intention (i.e., predictive scenario) (Corbière & Choquet, 2004b; Lejeune & Pernin, 2004), in order to enhance the quality of the learning. When designers use an Educational Modeling Language (EML) such as Learning Design (Koper, Olivier, & Anderson, 2003) proposed by IMS Global Learning Consortium (IMS, 2006) to explicit their intention regarding the learners’ activities during a session, a set of observation needs are implicitly defined. Thus, one of the student data analysis difficulties resides in the correlation between these needs and the tracking means provided by the educational environment (not only the computer-based system, but also the whole learning organization, including humans and data collection vectors such as video recorders, questionnaires, etc.). Our aim is to provide the actors of a Technology Enhanced Learning (TEL) System with a language dedicated to the description of the tracks and their semantics, including the definition of the observation needs and the means required for data acquisition. This language, the Usage Tracking Language (UTL), aims to be neutral regarding technologies and EMLs, and has to be instantiated on the EML used to define the pedagogical scenario and on the tracks format. Moreover, this language allows the structuring of tracks, from raw data – those acquired and provided by the educational environment during the learning session – to indicators (ICALTS, 2004) which mean something significant for its user. They usually denote a significant fact or event that happened during the learning session, on which users (designers, tutors, learners, and analysts) could base some conclusions concerning the quality of the learning, the interaction or the learning environment itself. A first part of our proposal has been developed today; it focuses only on the transformation of tracks by adding semantic. All the systems which need to analyse the user behaviour work with data-mining techniques (Mostow, 2004) or by hand. These techniques are often used to build user models or to adapt the content or the layout to the user (Zheng, Fan, Huan, Yin, Wei-Ying, & Liu, 2002). They are based on statistical or mathematical analyses (Bazsaliscza & Naim, 2001). Very often, these kinds of analysis have the particularity to work on raw data in order to bring out some indicators such as
Modeling Tracks for the Model Driven Re-engineering of a TEL System
11
the most frequently used keyword. In our case, we are interested in analysing the user behaviour, to improve the pedagogical scenario and the learning materials. Our proposal consists of an analysis driven by models. We want to guide the data analysis by means of the instantiated learning scenario, so we focus on the representation model of a scenario and the tracks’ format. The next section deals with the different viewpoints one could have on tracks depending on his role: designer, tutor, learner, or analyst. We introduce here the DGU model, for Defining, Getting and Using tracks, which allows the focus on three different facets of a track. Then, after having identified several kinds of data which are relevant and used for analyzing a learning session, we present their information models. The third section of this article presents the operational part of the Usage Tracking Language, namely UTL.1, which allows the description of the track semantics recorded by an LMS and to link them to observation needs defined in the predictive scenario. This language could be instantiated both on the formal language used to describe the pedagogical scenario and in the track file format implemented by the LMS. It could be considered as a first operational version (i.e., a subset) of the language presented by the previous section, which would aim to cover a wider scope of tracks and situations. In a fourth part, we provide several use cases which highlight the possibilities of this language. Finally, we conclude this article with some ideas concerning the next version of UTL, namely UTL.2. All the examples cited in this article are taken from a number of tests we have made with our students over the last three years. They all concern a learning system which is composed of six activities designed for teaching network services programming skills. We used the Free Style Learning system (Brocke, 2001), based on Open-USS LMS (Grob, Bensberg, & Dewanto, 2004), in which students can navigate as they choose to between all the activities. Our designers have defined a predictive scenario and, each year, we have compared this scenario with descriptive ones, by hand, for a reengineering purpose. TRACKS MODEL Three Viewpoints on Tracks: The DGU Model The practice of tracking is common (Hanson, Kraut, & Farber, 1984). The main problem of this method is the significant amount of data recorded. Therefore, one has to formulate hypotheses in order to conduct analysis, extract relevant information and link them. The relevance of this method was studied by Teubner and Vaske (1988). In our re-engineering framework, we stress the need for track modeling before the learning session, and to consider tracks as a pedagogical object, like any other learning object, such as resources or scenarios for instance. If this is
12
Choquet and Iksal
frequently the case in existing systems, when the tracking purpose is to provide learners and/or the system with useful information, it is more unusual for supporting the tutor, and it's rare for providing feedback to designers. With this in mind we could say that, as far as it could be possible for him, the designer who is engaged in the engineering process of a TEL system, should model the tracking needs of the learning session (Barré, El Kechaï, & Choquet, 2005). Then, the tracking needs should be instantiated on the educational environment in order to model the effective tracking means. Finally, one should also model the expected uses of these tracks, in terms of building the descriptive scenario for analyzing the usage. This is the way we have defined the three facets for the tracks modeling: • the Defining (D) facet which models the tracks needs; • the Getting (G) facet which models the tracks means; • the Using (U) facet, which models the tracks uses. In some cases, data acquired during a session are unexpected. If they have some utility, they also need to be modeled among these three facets in order to be incorporated in the descriptive scenario of the learning session and to probably bring up a new re-engineering cycle of the TEL system. Figure 1 shows the general overview of the DGU model. Track Conceptual Model Some recent European works focus on the tracking problematic and have provided outcomes on track representation, acquisition and analysis. Most of these works (DPULS, 2005; ICALTS, 2004; IA, 2005; TRAILS, 2004) have taken place in the Kaleidoscope European Network of Excellence (Kaleidoscope, 2004) and each of these projects have contributed to model tracks.
Figure 1. The DGU model
Modeling Tracks for the Model Driven Re-engineering of a TEL System
13
The TRAILS project (Personalized and Collaborative Trails of Digital and Non-Digital Learning Objects) investigates the trails that learners follow and create as they navigate through a space of learning objects. Learners interact with learning objects in the form of trails – time ordered sequences of learning objects. TRAILS is focused on the tracking of individual routes through learning objects, in order to identify the cognitive trail by analyzing the sequence of the learning objects the learner has consulted. This approach is close to (Champin, Prié, & Mille, 2003) and (Egyed-Zsigmond, Mille, & Prié, 2003), where authors consider user tracks in a hypermedia system as a sequence of actions, and use this for identifying the general goal of the user. Although, in this article, we don't restrict the meaning of a track as a sequence of actions, but in a more general way, as a datum which provides information on the learning session, we think, as the TRAILS project, that tracks have to be abstracted in a way they could be useful for understanding the cognitive trail of the learner. Moreover, we also think that a track has to be modeled as a descriptive scenario (emergent trail for TRAILS) and to be linked to, and compared with the predictive scenario (planned trails for TRAILS). The ICALTS project (Interaction & Collaboration Analysis' supporting Teachers & Students' Self-regulation) and the IA project (Interaction Analysis - supporting participants in technology based learning activities), which is a follow-up of ICALTS, …Propose that the design of technology based learning environments must not be limited to the initial means of action and communication, but should be extended by providing means of analysis of the very complex interactions that occur, when the participants of the learning activities work in individual or collaborative mode (presentation of the Interaction Analysis project, at http://www.noe-kaleidoscope.org/pub/ researcher/activities/jeirp/).
This position leads to consider, as ourselves, the definition of usage analysis means as a pedagogical design task, and the tracks themselves, as pedagogical objects. These projects have introduced and defined the concept of Interaction Analysis Indicator, as …Variables that describe something related to: (a) the mode or the process or the quality of the considered cognitive system learning activity (task related process or quality), (b) the features or the quality of the interaction product and/or (c) the mode, the process or the quality of the collaboration, when acting in the frame of a social context forming via the technology based learning environment (Kaleidoscope, n.d., p.12).
14
Choquet and Iksal
The DPULS project (Design Patterns for recording and analyzing Usage of Learning Systems) wanted to deal especially with analyzing the usage of elearning systems in real training or academic contexts, in order to help teachers to re-design alternatives to cope with observed difficulties in the learning scenario. This project has defined a structured set of Design Patterns (this structured set is accessible at: http://lucke.univ-lemans.fr:8080/dpuls/login.faces) thus providing instructional designers, teachers and tutors with experimented and possibly reusable patterns to support them in analyzing recurrent problems and tracking students’ activity. This project has enlarged the definition of the concept of Indicator as a feature of a datum (usually of a derived datum). It highlights a relation between the datum and an envisaged event which has a pedagogical significance. It has also proposed a definition for some concepts related to the tracks: raw-datum (recorded by the system), additional-datum (a datum which is not calculated or recorded, but linked to the learning situation, such as the predictive scenario, or a domain taxonomy), and derived-datum (calculated from other data). All of these projects have influenced our proposal. We have identified two main data types for tracks: the derived-datum type and the primary-datum type. The primary data are not calculated or elaborated with the help of other data or knowledge. They could be recorded before, during or after the learning session by the learning environment, for instance, a log file recorded by the system, a video tape of the learner during the session, a questionnaire acquired before or after the session, or the sets of posts in a forum. This kind of data is classified as raw-datum. The content-datum type concerns the outcomes provided by the learning session actors (learners, tutors and/or teachers). These data are mainly the productions of the learners, intended to be assessed, but they also could be, for instance, a tutor report on the activity of a learner, or on the use of a resource. Both of these data have to be identified in the collection of tracks provided by the learning environment, in terms of location and format. We introduce here the keyword and the value elements for this purpose. These elements will be discussed in the next section. The additional-datum type qualifies, as DPULS did, a datum which is linked to the learning situation and could be involved in the usage analysis. The derived data are calculated or inferred from primary data or other derived data. The indicator type qualifies derived data which have a pedagogical significance. Thus, an indicator is always relevant to a pedagogical context: it is always defined for at least one exploitation purpose, and linked to at least one concept of the scenario. We will detail this specific aspect further in the article. A derived datum which has to be calculated but which has no pedagogical significance is qualified as an intermediate-datum. We will now detail the information model of each data type of this model. The formalism used in the following schemas has the IMS Learning Design Information Model (IMS/LD, 2003) notation.
Modeling Tracks for the Model Driven Re-engineering of a TEL System
15
Figure 2. The conceptual model of a track • Only elements are shown (no attributes). • The diagrams are tree structures, to be read from left to right. An element on the left contains the elements on the right side. The element most to the left is the top of the tree. • An OR relationship in the diagrams is represented with <. • An AND relationship in the diagrams is represented with [. • * means that the element occurs zero or more times in the container. • + means that the element occurs one or more times in the container. • ? means that the element is optional. • When not one of the symbols (*, +, ?) is placed before the element name, the element occurs exactly one time. Each data type has three facets (Defining, Getting, Using) and follows the DGU model we have defined. This model allows two processes for modeling a datum: the predicted one, when the designers during the design phase declare the datum as needed, and the unpredicted one when the datum is collected or calculated without an explicit designer's request. In the first process, the Defining and the Using facets are filled first; then the Getting facet is discussed with developers. This is the way one could provide, for instance, examples and descriptions rather than a specific technique or tool. In the second process, developers and/or analysts fill the Getting facet first, then the Using and Defining facets are discussed with designers.
16
Choquet and Iksal
The Raw-datum Information Model Raw-datum.Defining is composed by the Title of the datum. It should be a semantic short string. If necessary, a more detailed Description could be added. Raw-datum.Getting focuses on the means for acquiring the datum. It is composed by the Collection-type element which could be a Human-collection, operated by at least one Role (for instance, an observer of the session), with a specific Collection-vector (for instance, a video recorder, or paper and pen), or an Automatic-collection. This kind of collection is characterized by the nature (e.g. log file, chat, mail) of the collection – the Record-type (which takes values in an open list), and the Record-tool. If this tool is already available in the learning environment, one could provide its Location; if not, one could provide the developers with a Description and/or some known Examples. Raw-datum.Getting is also composed by the Location of the datum, namely the URL of the file which contains it, and by the Acquisition-time of the datum (before-session, during-session, after-session). Raw-datum.Using is composed by two elements: the Used-by one, which exists only for commodity and allows a bottom-up browsing of the data dependencies graph, and the Content one which allows the retrieving of the datum from its source. The category of the content of a datum could be Keyword or Value. These generic concepts allow the description of multiple tracks formats from structured text files to databases and videos (see next sections for examples). This Content element is in italic in Figure 3 to denote the already developed part of the UTL language. It constitutes the part of the meta-language which could be instantiated on the tracks of a specific learning environment. The Content-datum Information Model As for raw data, the Content-datum.Defining is composed of a Title and a possible Description. Content data are the outcomes of a learning session.
Figure 3. The Raw-datum information model
Modeling Tracks for the Model Driven Re-engineering of a TEL System
17
Thus, they are always well-identified, even unpredicted ones. The Contentdatum.Getting is then characterized by its Location, the Date of the production and the Actor who has produced this datum. It is also characterized by at least one Traceable-concept of the scenario. Traceable-concept is in italic in Figure 4 to denote the already developed part of the UTL language. It constitutes the part of the meta-language which could be instantiated on a specific EML. The Content-datum.Using facet is composed of the Content of the datum, its Format and, as for raw-data, the list of the data which use it. The Additional-datum Information Model Additional data are multiples (e.g., the predictive scenario, an ontology, a domain taxonomy, an academic curriculum); thus, the Additionaldatum.Defining adds the Type of the datum to its Title and its Description. An additional datum is well known and identified: the Additional-datum.Getting refers only to the Location of the datum. The Additional-datum.Using facet is composed of the Content of the datum, its Format and the list of the data which use it. The Intermediate-datum Information Model The Intermediate-datum.Defining is composed of a Title and a possible Description. The Intermediate-datum.Getting characterizes the mean for establishing the datum. It is mainly composed by the Components element, which allows the definition of the graph of dependencies of the datum, which is always defined with the use of primary data (raw, content or additional data) and/or derived data (intermediate data or indicators), and by the Method element. The getting method Type could be manual, semi-automatic or automatic. If a human intervention is required, one should define it with the help of the
Figure 4. The Content-datum information model
Figure 5. The Additional-datum information model
18
Choquet and Iksal
Figure 6. The Intermediate-datum information model Role-involved element. If the method is semi-automatic or automatic, the support Tool has to be defined by its Location, if available, or by a Description and some Examples. We assume here that only one tool could be specified for an intermediate datum. If more than one is needed, several intermediate data have to be defined. The Intermediate-datum.Using facet is composed with the Content of the datum, its Format and the list of the data which use it. The Indicator Information Model The Indicator.Defining and Indication.Getting facets are similar to the Intermediate-datum facets. The Indicator.Using facet is characterized by a Pedagogical-context element which defines the context of use and the purpose of the Indicator. This pedagogical context is described by a Traceable-concept, as the contentdata, and by an Exploitation-purpose, performed by at least a Recipient-role. We have currently defined four Types for this exploitation (see Figure 1) – re-engineering, regulating, assessing, reflecting – but we consider this Type element as an open list. We will provide some use cases which highlight the possible uses of these models after having focused, in the next section, on the operational version of UTL.
Modeling Tracks for the Model Driven Re-engineering of a TEL System
19
Figure 7. The Indicator information model UTL – Operational Part (UTL.1) As already mentioned, our activity focuses on the re-engineering driven by models. We consider that each designer has his own representation model for the learning activity. In order to facilitate the comprehension of the analysis, the tools must take into account the designer model and provide the results using the same model. In our experiments, we focus on a standard model of representation: IMS Learning Design (Koper, Olivier, & Anderson, 2003). But, we prefer to consider a meta-model in which all designers’ models may be described. XML-Schema is an interesting candidate because a number of models are based on this meta-language. We currently have a project on the collaborative design of a model of representation for learning scenario (El Kechaï, & Choquet, 2005). In this project, we plan to develop a collaborative editor based on XML-Schema. Thus, one of the goals of this project is to design tools that may work on XML-Schema in order to interpret designer’s models. Since the beginning of our experiments, we have used IMS-LD to describe learning scenario, and IMS-LD has its own description in XML-Schema.
20
Choquet and Iksal
UTL.1, the operational part, has to be instantiated according to (i) the representation model (designer’s model) and, (ii) the specific format of log files. Because they have not been designed for this, existing representation models don't include tracking facilities, so UTL.1 is proposed to link tracks and designers’ models through semantic data. At present, UTL is composed of two parts and is implemented in XML-Schema. It is a first attempt to operationalize the information model proposed in the first part of this article. Firstly, we propose to extract the concepts of the representation language for those we could have (log) tracks. In a second part, we associate these traceable concepts with the specific description of tracks – description given by the tracks format. The Connection with the Representation Model This part of UTL can be considered as an extension of the representation model. It is used to classify all concepts of the representation model that are traceable. We will use this information in the second part of UTL to link the model and tracks. This section has been designed to be as generic as possible, because we want it to be compatible with the majority of designers’ models. In a representation model, we have a lot of concepts used to describe the pedagogical scenario; for instance in LD, we can have Activity, learning object, role, and so on. A Traceable-concept is a concept from which it is possible to track something, for instance, a resource and an activity are traceable concepts from which we can track the beginning, the end, and the duration. Figure 8 presents the information model of this part of UTL. The description of the Traceable-concept is composed of all relationships with other Traceable-concepts; for instance: an Activity is realized by a Resource. The Title of the relationship brings more semantic to the interpretation of the context of tracks. This concept is typically, in the context of an Abstract-scenario modeled with Learning Design, an activity, or an environment. But it could also be an Enterprise concept which is domain specific and could not be reified with an EML, as (Laforcade, & Choquet, 2006) defines. So the Type attribute refers to these two values: Enterprise and Abstract-scenario. The other information consists of the Observed Use, it allows the description of the relationship between tracks and the traceable concept. We can say that a Figure 8. The meta-language UTL– specific set of tracks is interesting representation part because it describes the Evaluation
Modeling Tracks for the Model Driven Re-engineering of a TEL System
21
(Title) of the Activity called Learning by Doing. Examples based on this information model are given in the next part of the article. The Description of Tracks In order to work on the track itself, we need to identify it or a part of it. Thus, we have defined another section in UTL: the track representation presented in Figure 9. The model is also generic, and we propose an implementation that should work with the majority of log formats. We have experimented it on usual log files (described in section 4), and we have to validate the use of XML log files and logs stored in databases. To manage each log format, we used two fields in our model, the Type which refers to “Text”, “XML”, “Database”, and and perhaps others, and the Path which is optional and contains the path to the specific data to describe. A path can be expressed in XPath or SQL, it is not used for classical log formats. In order to describe the location of data inside a string, we propose to use character positions and/or tokens. This section of UTL is useful for retrieving specific tracks, extracting values and bringing sense to each of them. We consider two categories of content in tracks. Keyword is used to retrieve the track, it is a word (or a sentence) which is always present in the same kind of track. And value depends on the learner, it may be the time spent to read, the name of the page read or the score of the evaluation exercise. The content locations are used to specify the position inside the track of the keyword or the value. The specific attributes for the specification of the content locations are the following: • Title: Is used to name the content – to associate semantics (e.g., Date or Task). • Begin: Gives the first character position of the content. • End: Gives the last character position of the content (-1 for the end of the line). • Delimiter: Sets the delimiter used to break down the track into tokens. • Position: Gives the position of the token. The Data field is used to store the value or to indicate the keyword. In an XML syntax, it is the textual con- Figure 9. The meta-language tent of the tag Content. UTL – track part
22
Choquet and Iksal
Example of Use and Instantiations
Example of Use In this section, we consider a prescribed scenario in IMS-LD and tracks generated by Free Style Learning (FSL). In this case, a session is identified by the student’s identifier, because for one session we have a set of log files which corresponds to the work of a single student. Here is an extract of one FSL log file: [04/12/2002:03:18:55 +0952] [FreeApp] Intro gestartet [04/12/2002:03:18:55 +0962] [FreeVideoPlayer] start() currentTime=0s [04/12/2002:03:21:58 +0775] [FreeVideoPlayer] stop() currentTime=182.0s [04/12/2002:03:22:38 +0982] [FreeVideoPlayer] pause() currentTime=0.0s [04/12/2002:03:22:39 +0002] [FreeTextStudyManager] Standard-Init der Textstudy [04/12/2002:03:22:39 +0012] [FreeTextStudyManager] in showactualPage vor if mit ende = false & shownNode: Heberger soi-même son site [04/12/2002:03:22:39 +0022] [FreeNotesManager] [Store] Notiz zu Réaliser un serveur HTTP.Etude de textes.Heberger soi-même son site vorhanden? [04/12/2002:03:22:39 +0032] [FreeNotesManager] Nein. [04/12/2002:03:22:39 +0032] [FreeNotesManager] [NotesManager] NoteButton umschalten auf: Existiert nicht! [04/12/2002:03:22:39 +0052] [FreeTextStudyManager] LOAD FILE [04/12/2002:03:22:39 +0072] [FreeTextStudyManager] Lade TS-Datei: D:\FSL\Granulat\services\txtStudy\txt-0.htm [04/12/2002:03:22:39 +0082] [FreeLinkManager] Setze LinkViewButton für Topic: txtStudy.Heberger soi-même son site auf false [04/12/2002:03:22:39 +0153] [FreeApp] textStudy gestartet
First, we describe some data that can be extracted according to these two models. In the following example, we describe a track which represents the end of the use of a video player by the learner. A UTL description has been used to filter the log file and to extract the following track: [18/12/2002:09:45:29 +0043] [FreeVideoPlayer] stop() currentTime=182.0s
So, we have obtained the following data: Date of the track: 18/12/2002:09:45:29 Duration of the video: 182.0s
In this example, we worked with a single student. In other experiments, we may have to track the activity of a group (especially in collaborative work). UTL is able to describe tracks if we have a single log file for all members – server log file – and also if we collect a set of log files, one per member – client log file. We just have to define in the designer model the concept of “group” and “member of group.”
Modeling Tracks for the Model Driven Re-engineering of a TEL System
23
Instantiation of UTL for IMS-LD In our experiments, we have used IMS-LD as a representation model for the designer. In order to manage tracks according to this language, the following piece of code is an extract of the IMS-LD’s instantiation concerning Activity, Role, LearningObject and Resource. <xsd:element name="Activity" type="TraceableConceptType" substitutionGroup="TraceableConcept"/> <xsd:element name="Role" type="TraceableConceptType" substitutionGroup="TraceableConcept"/> <xsd:element name="LearningObject" type="TraceableConceptType" substitutionGroup="TraceableConcept"/> <xsd:element name="Resource" type="TraceableConceptType" substitutionGroup="TraceableConcept"/>
The next stage consists in instantiating the UTL-LD file with a specific scenario which we decided to analyse. This step is necessary to associate semantic to tracks, that is to say, to link each track with the relevant object of the learning scenario. The following piece of code represents relationships between all activities and resources of our experiment.
Instantiation of UTL in FSL Log Format After having prepared data about the learning scenario, we have to describe the tracks’ format according to the deployment platform: Free Style Learning. Our scenario was deployed on FSL, so we describe the format of FSL tracks. The following piece of code is the representation of tracks con-
24
Choquet and Iksal
cerning the management of the resource VideoIntro which is an introduction of the course based on a video. Operations are start, stop, pause and play. We describe here keywords that are necessary to identify the track, for instance “Intro gestartet” for the beginning of the video, and also values that have to be extracted, for instance the date of the track.
Modeling Tracks for the Model Driven Re-engineering of a TEL System
25
Use Scenarios Our first need on usage analysis is about track analysis. We have three years of logs on two different experiments. For each of these case studies, we have a prescribed scenario described in IMS Learning Design. All resources used by this scenario are indexed with LOM (LOM, 2002). We use our Usage Tracking Language to bring semantics to each track. The first step consists in the interpretation of tracks according to the designer model and the corresponding track semantic description. Next, the observed usage of the learning system is available for the analysis. Usage Analysis: Track Interpretation At the beginning, automatic track analysis needs an automatic interpretation of these tracks. UTL is designed to add semantics to the content of the log files. We use it to filter the content of the log files, that is to say, to keep only tracks that are considered relevant by the designer. A track is relevant if a description is given inside the UTL file. The second use of UTL consists of associating a specific type to each track and in extracting values that are representative of the learner’s activity. The result of this stage is a data structure which contains the interpretable tracks and which is shareable between analysis services. The data structure is available also for each researcher who wishes to propose new services. Usage Analysis: Derived Data Processing There are various ways to use the interpreted tracks. For instance, we may evaluate a resource use, or compare a learner scenario with the predictive scenario. Moreover, with the same raw data, different interpretations could be made. For instance, mails sent between the actors of a learning session could be parsed in order to find semantic markers which could reveal the social and affective roles in the group (Cottier, & Schmidt, 2005), or could be visualized as an oriented graph between actors in order to measure the cohesion in the group (Reffay, & Chanier, 2003). Using the semantic description of tracks allows us to define services in a declarative way, that is to say, providing analysis services independent from a platform track format. We currently are developing a first set of these services. Examples of analysis results are the following: rate of use of a resource, performance of a student, emergence of a role (e.g., leader), extraction of an observed learning scenario, and detection of a sequence of resource uses which have not been prescribed. To present some ideas in the usage analysis, we will focus on three cases: a statistical datum, a result which has to be re-transcribed in the designer’s model, and intelligent information detection.
26
Choquet and Iksal
A statistical datum. These data are, for instance, the rate of use of a resource, the average mark concerning the evaluation exercise, or the time spent on a particular activity (the shortest, the average, and the longest). We filter the tracks according to their semantics and to make a small calculation on them. As an example, for the rate of use, a first solution is to count students for whom we find at least one track about the use of the resource. In our experiments, we had a first phase of engineering when the designer has declared a need of observation concerning the use of resources. We highlighted that numerous students spent less than 15s on resources at the beginning of the scenario. In a second phase, called re-engineering phase, we modified the observation tools in order to filter resources’ use fewer than 15s, in order to detect the exploration period when students clicked everywhere.
Re-transcription in the designer’s model. One of the main goals of the reengineering driven by models is to use the same representation model for the description of the predictive scenario by the designer as for the observed scenario build with tracks generated by the learning system. In our first experiments, we worked with IMS-LD as a representation model. The interest in the use of a common model is the possible comparison between the different scenarios that leads us to identify non-predicted usage of resources or incoherence in the sequence of activities. In one of our experiments (the one based on FSL), we observed that some students have used the evaluation exercise as a quiz at the beginning of the experiment, they have only navigated inside the list of questions in order to self-evaluate their knowledge (before the first activity of the learning session). This observation leads the designer to propose two facets to the exercise, one for evaluation and another for a quiz. We consider two kinds of re-transcription of the observed scenario: the one generated from single student tracks, and a stereotypical scenario that represent a combination of all student scenarios. (a) Re-transcription of one student’s observed scenario. First, we have to read the representation model in order to identify the core concept, such as the activity for IMS-LD. Next, we filter tracks in order to represent this concept and all its components. The last step consists of organizing all instances of the core concept in a sequence which corresponds to the observed scenario. (b) Re-transcription of a stereotypical observed scenario. A stereotypical observed scenario corresponds to the combination of all student scenarios. To build this scenario, we must have all the students’ observed scenarios. Next, we compare the sequence of core elements (e.g., activities), and we compare in depth each element. We observe the percentages about the use or the position in the sequence of each element. A stereotypical scenario is a graph where each relation is qual-
Modeling Tracks for the Model Driven Re-engineering of a TEL System
27
ified with the percentage of students who have chosen the corresponding direction. Usage Analysis: From Raw-data to Indicators If we consider all information models of the first part of this article, in this section, we propose a scenario which uses interpreted tracks to compute indicators. This scenario is based on a design pattern taken from the framework of the DPULS Project (DPULS, 2005) which is called: “Playing Around with Learning Resources” (this Design Pattern is accessible at: http://lucke.univlemans.fr:8080/dpuls/login.faces) This pattern provides an approach to detect learner playing around with resources at the beginning of an activity. In this pattern, we propose a solution based on the computation of two indicators: The characterization of the sequence of resources and the characterization of the time of an activity. The mean of the first indicator is the following: the sequence of resources attempted by a learner is valued as non-significant if the duration of each resource is less than a fraction (for us 10%) of the Typical Learning Time defined for the relative resource. The second indicator says: the time of an activity is qualified as the beginning if the effective duration of the activity is less than a fraction (for us 10%) of the Typical Learning Time of an activity. In Figure 10, the graph which represents the use of data is presented. We can find in this graph the indicators but also some intermediate data, additional data and raw data. UTL.1, as presented before is able to identify and extract these raw data; the example can be found in the previous section. UTL.2 which includes the first section of this article has to take into account the use of these raw data for the generation of indicators for the
Figure 10. Map of indicators and data used
28
Choquet and Iksal
pedagogical designer. The started time and the stop time of a resource are used to evaluate the duration of the resource. The additional data such as the typical learning time can be extracted from the prescribed scenario (for instance the field 5.9 of the LOM). It can be also given by the designer, such as the Playing Around Typical Learning Time of a resource. This is a percentage of the use time of a resource considered as a minimum time, under which limit the use cannot be taken into account. Now, we will present the description of these data within our proposal. Table 1 presents the information table for the raw-datum called Started time of a resource, Table 2 presents the information table for the additional-datum called Typical learning time of a resource, Table 3 describes the information table for the Intermediate-datum called Sequence of resources and finally, Table 4, where Pc means Pedagogical-context, Tc, Traceable-concept and Ep, Exploitation-purpose, presents the information table for the indicator called The characterization of the sequence of resources. All these descriptions expressed in an XML file can be processed by a system in order to pre-calculate some data and to interact with an analyst or a designer for data that need a semi-automatic or manual method. The sysTable 1 Information Table for a Raw-Datum D G
U
Title Description Acquisition-time Record-type Record-tool.Title Record-tool.Location Content.Keyword Content.Value Used-by
Started time of the video intro These datum stores the time of the beginning of a video’s use. During the session Log file FSL methods for the generation of tracks ~exp/StudentID/file.FSL “FreeApp” from character “intro gestartet” from character 33 to 40 42 to 57 Date from character in position 1 to 26 "Sequence of resources"
Table 2 Information Table for an Additional-Datum D
Title Type Description
G U
Location Content Format Used-by
Typical learning time of the video intro Number These datum stores the estimated time for the resource of our scenario which is an introduction by mean of a video In the metadata section of the scenario 182 Integer, time in seconds "Sequence of resources"
Modeling Tracks for the Model Driven Re-engineering of a TEL System
29
Table 3 Information Table for an Intermediate-Datum D G
U
Title Title Component.Primary-Datum … Method.Type Method.Tool.Title Method.Tool.Location Content Format Used-by
Sequence of resources Retrieve and organize resources’ starting tracks "Started time of the video intro" Automatic Calculation_Resources_Sequence Method of a java library for the project Too long for this article List of couple (resource-name, started-time) "The characterization of the sequence of resources"
The characterization of the sequence of resources Evaluation of the relevance of a sequence of resources "Sequence of resources" The sequence of resources attempted by a learner is valued as "non significant" if the duration of each resource is less than a fraction (for instance 10%) of the Typical Learning Time defined for the relative resource Automatic Calculation_Relevance_Resources_Sequence Method of a java library for the project "Significant" or "Non significant" String Scenario Abstract Scenario "Learning Web Server Programming" Reengineering Designer
tem which uses this extension is not currently developed. Only the UTL version presented in the third section is useable. Perspectives The DGU (Defining, Getting and Using) model presented in this article is well suited for defining what the system has to track, based on the predictive scenario designed for a learning activity. For each data, the designer
30
Choquet and Iksal
can define what to track, how to track (i.e., the tracking means) and why it should be tracked (i.e., the semantics of the track chunk). Each data can be combined with others in order to provide high level indicators for the analyst or the designer. At the moment, a part of this model is developed; we called it UTL.1 for Usage Tracking Language. We have now to extend UTL.1 in UTL.2 by including all the elements of DGU, and also all services needed to calculate indicators and intermediate-data. Because of its metalevel, this language could be also used, after usage analysis, to define and highlight semantic links between predictive and descriptive scenarios. Works such as (Seel, & Dijkstra, 1997) have shown that teachers and trainers – who are the main potential designers of educational systems – have some difficulties in instructional design, especially regarding the explicitation and technical reification of their pedagogical intentions. We are defining rules which can be inferred on the meta-model (e.g. the XML-Schema) of the instructional language used by a designer (for instance, Learning Design) in order to identify opportunities and observation possibilities (Barré, & Choquet, 2005). They reason on the structure of the instructional language (datatype, relations, and so on) and provide the designer with information on the needs of observation. These needs are relative to the concepts of the language and thus, define the traceable concepts. Using these rules with UTL.2 could be a way to provide designers with a semi-automatic tool for decision helping purposes. Our approach of student data capture is focused on automatic techniques driven by designer prescriptions. UTL is presently without the spectrum of both existing non-automatic techniques, such as interviews for instance, and data-mining or machine learning ones. We think all these techniques, including ours, are complementary. Now, we have to operationalize and validate UTL.2 with all types of data (e.g., electronic, interviews, video). We have started a study with researchers specialized in usage analysis (Communication Science background) of which the objective is to define when, why and how a designer has to explicit the requirements to these techniques. Actually, we have two new experiments which will be used on UTL.2. These experiments lead us to consider new tracks’ formats and also new EMLs. In the Symba experiment, students have to organize themselves the project management concerning a website development by using the Symba tool (Betbeder, & Tchounikine, 2003), the scenario is in IMS-LD. In the AEB experiment, we mobilize various actors of apprenticeship training for the collaborative and iterative development of an Apprenticeship Electronic Booklet (AEB). These actors are also the future users of the AEB. They are trainers, training managers, employers and apprentices. The AEB is a learning system where information concerning the apprentice’s training progression is consigned. Its goal is to help them in the appropriation of their training and to give the trainers and the employers the possibility to evaluate their apprentice’s knowledge acquisition, to per-
Modeling Tracks for the Model Driven Re-engineering of a TEL System
31
ceive their progression in the training and to regulate it. In this experiment, the representation model used is a set of use cases. References Barré, V., & Choquet, C. (2005). Language independent rules for suggesting and formalizing observed uses in a pedagogical re-engineering context. IEEE International Conference on Advanced Learning Technologies (ICALT'2005), 550-554. Barré, V., El Kechaï, H., & Choquet, C. (2005). Re-engineering of collaborative e-learning systems: Evaluation of system, collaboration and acquired knowledge qualities. AIED'05 Workshop: Usage Analysis in Learning Systems, 9-16. Bazsaliscza, M., & Naim, P. (2001). Data mining pour le web. Paris: Eyrolles Eds. Berners-lee, T. (1999). Weaving the web. San Francisco: Harper Eds. Betbeder, M.-L., & Tchounikine, P. (2003). Symba a framework to support collective activities in an educational context. ICCE’2003, 188-196. Brocke, J. V. (2001). Freestyle learning – Concept, platforms, and applications for individual learning scenarios. 46th International Scientific Colloquium, 149-151. Champin, P.-A., Prié, Y., & Mille, A. (2003). MUSETTE: Modeling USEs and tasks for tracing experience. Workshop 'From Structured Cases to Unstructured Problem Solving Episodes For Experience-Based Assistance', ICCBR'03, 279-286. Corbière, A., & Choquet, C. (2004a). Re-engineering method for multimedia system in education. IEEE Sixth International Symposium on Multimedia Software Engineering (MSE), 80-87. Corbiere, A., & Choquet, C. (2004b). A model driven analysis approach for the re-engineering of e-learning systems. ICICTE'04, 242-247. Cottier, P., & Schmidt, C. (2005). Le dialogue en contexte: Pour une approche dialogique des environnements d'apprentissage collectif. Revue d'intelligence artificielle, 19(1-2), 235-252. DPULS (2005). Design patterns for recording and analysing usage of learning systems. Consulted May, 2006, from http://www.noe-kaleidoscope.org Egyed-Zsigmond, E., Mille, A., & Prié, Y. (2003). Club (Trèfle): A use trace model. 5th International Conference on Case-Based Reasoning, 146-160. El Kechaï, H., & Choquet, C. (2006). Understanding the collective design process by analyzing intermediary objects. The 6th IEEE International Conference on Advanced Learning Technologies (ICALT'2006). Submitted for publication. Grob, H. L., Bensberg, F., & Dewanto, B. L. (2004). Developing, deploying, using and evaluating an open source learning management system. Journal of Computing and Information Technology, 12(2), 127-134. Hanson S.-J., Kraut R.-E., & Farber J.-M. (1984). Interface design and multivariate analysis of UNIX command use. ACM Transactions on Information Systems (TOIS), 2(1), 42-57. IA (2005). Interaction analysis. Consulted May, 2006, from http://www.noe-kaleidoscope.org (ICALTS, 2004). Interaction & collaboration analysis' supporting teachers & students' self-regulation. Consulted May, 2006, from http://www.noe-kaleidoscope.org IMS (2006). IMS Global Learning Consortium. Consulted May, 2006, from http://www.imsglobal.org/ IMS/LD (2003). IMS Learning Design. Consulted May, 2006, from http://www.imsglobal.org/ learningdesign/index.html
32
Choquet and Iksal
Kaleidoscope (2004). Consulted May, 2006, from http://www.noe-kaleidoscope.org Kaleidoscope (n.d.). State of the art: Interaction analysis indicators. Available at http://www.rhodes.aegean.gr/LTEE/Kaleidoscope-Icalts/ Koper, R., Olivier, B., & Anderson, T. (2003). IMS learning design information model (version 1.0). IMS Global Learning Consortium, Inc. Laforcade, P., & Choquet, C. (2006). Next step for educational modeling languages: The model driven engineering and re-engineering approach. The 6th IEEE International Conference on Advanced Learning Technologies (ICALT'2006) Submitted for publication. Lejeune, A., & Pernin, J-P. (2004). A taxonomy for scenario-based engineering. Cognition and Exploratory Learning in Digital Age, (CELDA 2004), 249-256. LOM (2002). Draft standard for learning object metadata. (IEEE). Mostow, J. (2004). Some useful design tactics for mining ITS data. Proceedings of the ITS2004 Workshop on Analyzing Student-Tutor Interaction Logs to Improve Educational Outcomes, 20-28. Reffay, C., & Chanier, T. (2003). How social network analysis can help to measure cohesion in collaborative distance-learning. Proceedings of Computer Supported Collaborative Learning Conference (CSCL'2003), 343-352. Seel, N., & Dijkstra, S. (1997). General introduction. Instructional design: International perspectives, 2, 1-13. Hillsdale, NJ, Lawrence Erlbaum Associates. Teubner, A., & Vaske, J. (1988). Monitoring computer users' behaviour in office environments. Behaviour and Information Technology, 7, 67-78. TRAILS (2004). Personalised and collaborative trails of digital and non-digital learning objects. Consulted May, 2006, from http://www.noe-kaleidoscope.org Zheng, C., Fan, L., Huan, L., Yin, L., Wei-Ying, M., & Liu, W. (2002). User intention modeling in web applications using data mining. World Wide Web: Internet and Web Information Systems, 181–191.
Usage Analysis in Learning Systems, 33-53
Results from Action Analysis in an Interactive Learning Environment OLIVER SCHEUER German Research Center for Artificial Intelligence DFKI, Germany [email protected] MARTIN MÜHLENBROCK European Patent Office, The Netherlands [email protected]
ERICA MELIS German Research Center for Artificial Intelligence DFKI, Germany [email protected] Recently, there is a growing interest in the automatic analysis of learner activity in web-based learning environments. The approach and system SIAM (System for Interaction Analysis by Machine learning) presented in this article aims at helping to establish a basis for the automatic analysis of interaction data by developing a data logging and analysis system based on a standard database server and standard machine learning techniques. The contribution is the integration of components which are appropriate for large amount of data. The analysis system has been connected to the web-based interactive learning environment for mathematics, ActiveMath, but is designed to allow for interfacing to other web-based learning environments, too. The results of several usages of this action analysis tool are presented and discussed. They indicate potentials for further development and usages.
Introduction Recently, there is a growing interest in the automatic analysis of learner interaction data with web-based learning environments. This is largely due to the increasing availability of log data from learning environments and in particular from web-based ones. The potential outputs include the detection of regularities and deviations in the learners’ or teachers’ actions as well as
34
Scheuer, Mühlenbrock, and Melis
building of models which can predict learner features from log data. The objective is to use those outputs to support teachers and learners by providing them with information that helps to manage their learning and teaching or to use the information for adaptation actions of eLearning systems. Commercial systems such as WebCT, Blackboard, and LearningSpace already give access to some information related to the activity of the learners including some statistical analyses, and provide teachers with information on course attendance and exam results. With this information already being useful, it only represents the tip of the iceberg of what might be possible by using advanced technologies. This upcoming research area, (i.e., addressing the automatic analysis of learner interaction data), is related to several well-established areas of research including intelligent tutoring systems, web mining, statistics, and machine learning, and can build upon results form these fields for achieving its objectives. In contrast to intelligent tutoring systems, learner interaction analysis does not rely on models of the learner or of the domain knowledge since these are heavy to build and maintain. In this regards, learner interaction analysis is comparable to website data mining but with a specific perspective on learning settings and with the availability of pedagogical data that usually are not available in web mining applications that are mostly based on click-through data only. Click-through data streams only allow for a rather shallow analysis, but with the inclusion of pedagogical data, more advanced techniques can be adopted from the field of machine learning, for example, in order to learn models of individual behavioural and non-observable variables that can predict the nonobservable variables from behaviour of future students . Although a number of open questions have already been tackled (Arroyo, Murray, & Woolf., 2004; Heiner, Beck, & Mostow, 2004; Merceron & Yacef , 2003; Merceron, Oliveira, Scholl, & Ullrich 2004; Mostow, 2004; Oliveira & Domingues, 2003; Zhang & Lu, 2001), there is not yet a systematic approach in analysis interaction data from huge learner action logs nor are there common architectures. The approach presented in this article aims at helping to establish a basis for the automatic analysis of interaction data by developing a data logging and analysis system based on a standard database server and standard machine learning techniques. This work was conducted in context of the iClass project which aimed at developing an intelligent, cognitive-based e-learning system, to enhance the iClass system with profiling capabilities. Because the iClass system was still under development when this research was carried out, our analysis system has been connected to the web-based interactive learning environment for mathematics ActiveMath which provides the log files. However, the system is designed for interfacing to other webbased learning environments, too, which will enable us to connect it with the upcoming iClass system. It has been tested with a medium scale experiment
Results from Action Analysis in an Interactive Learning Environment
35
in which four classes of a secondary school participated throughout a school term of five months on a weekly basis as well as on a large scale experiments in context of an introductory mathematics course at a UK-based University. This article is organized as follows. In the following section, the SIAM system will be described, which is comprised of a learning environment, a logging component, and an analysis component. Subsequently, two studies will be presented: In the first one, the system's capabilities to estimate students’ performance and gender, and in the second one the relationship between students' behaviour and cognitive style was investigated. The Action Analysis System SIAM The SIAM system is comprised of three major parts, that is, a learning environment, an action logging component, and an action analysis component (see Figure 1). These system parts will be described in more detail in this section. In addition to these components, there are also three data repositories involved in storing and providing information: a learning material database, a set of user log files, and a database containing logs and possibly additional user data and context data. The analysis subsystem has been implemented by using standard technology such as Java and mySQL, which are available for a number of platforms and operating systems, together with the suitable drivers for database
Figure 1. System architecture
36
Scheuer, Mühlenbrock, and Melis
connectivity. In addition, the Analyzer is based on the Weka (Witten & Frank, 1999) and YALE (Fischer, Klinkenberg, Mierswa, & Ritthoff, 2002) toolkits, which provide tools for visualizing and exploring data as well as means for integrating machine learning functionality into applications. As a starting point, the web-based learning environment ActiveMath has been used to provide a testbed for developing and testing the action logging and analysis components. However, these components have been designed to be mostly independent of a specific learning environment, allowing for providing the same logging and analysis functionality to other learning environments.
ActiveMath Learning Environment ActiveMath is a web-based learning environment that dynamically generates interactive courses adapted to the student's goals, preferences, capabilities, and prior knowledge (Melis et al., 2001; Melis et al., 2006). The content is represented in a XML-knowledge language for an educational context which greatly supports reusability and interoperability of the encoded content. ActiveMath supports individualized learning material in a user-adaptive environment, active and exploratory learning by using (mathematics) service tools and with feedback (see Figure 2). For different purposes and for different users, the learning material and its presentation can be adapted: the selection of the content, its organization, and the means for supporting the user have to be different for a novice and an expert user, for an engineer and a mathematician, for different learning situations such as a quick review and a full study. Since there is no way of knowing in advance the goals, the profile, and the preferences of any user when designing the system, ActiveMath builds on adaptive course generation. One component of ActiveMath – its event framework – is especially relevant to the SIAM system since it provides the information to be logged. The event framework (Melis et al., 2006) realizes a publish-subscribe scenario and governs the asynchronous communication between system components and even communication of remote services. Events are a mechanism for a powerful and flexible, yet rather loose integration of components. An example for event publication is the following: when the learner finished working on an exercise, the exercise subsystem issues an event. The event carries information describing the learner, the identifier of the exercise, the success rate, and the time stamp of the event. Listeners that can subscribe to such an event can be the learner model as well as the suggestor of the tutorial component. A component that publishes events is called an event source. A component that subscribes to the events published by an event source is called a listener which receives event messages from the event source. In contrast to a full-fledged messaging model, events remain anonymous rather than being sent from a specific sender to a specific recipient: when publishing an event, the event source is usually not aware who is listening
Results from Action Analysis in an Interactive Learning Environment
37
to the events (only the module managing the subscriptions is). Moreover, usually the listener does not care which component or module created the event, it only knows where to subscribe to the events it is interested in. The Action Logging Component receives the trace of user actions by subscribing to the event framework for relevant events. To return to the example above, when for example, the learner finished working on an exercise, the generated event will be received by the Action Logging Component. Figure 2 shows the ActiveMath user interface. The left panel shows the table of contents (TOC) of the currently selected book (in ActiveMath, the term book is used as a metaphor for a course). The contents are organized in a hierarchy with expandable and collapsible nodes. The lowest level is constituted by single book pages which can be selected. A colored bullet lefthand side of each book page item indicates the system’s belief in the student’s mastery for the respective contents. Right of the TOC the currently selected book page is presented. Each page consists of a sequence of learning items, which may be reading material, exercises, or interactive examples. Exercises are displayed in a separate window, when the Start exercise link is clicked. Relevant concepts occurring in the texts are hyperlinked; their selection opens a window containing a concept definition or other additional information. In the upper-right of each
Figure 2. ActiveMath user interface
38
Scheuer, Mühlenbrock, and Melis
displayed learning item is a Note icon, where students can access the notes functionality. Students can use it to annotate learning items with private or public comments, or to read already existing notes. In the lower part of the page a previous- and a next-button are located. As alternative to the TOC, from which pages can be accessed based on a hierarchical overview, these buttons allow movement through the contents in a predefined linear sequence. Finally, in the upper-part of the screen, the menu bar is located which offers functionalities that are independent of specific contents and which are generally available. This includes a Menu link to return to the main page were books can be selected, a Dictionary link which opens a window where search queries can be submitted and a Logout button to finish the current session. Action Logging Component For each learner, the ActiveMath environment generates an online log that lists user actions in the learning environment in terms of general information such as time, type of action, user name, and session number, as well as specific information including which page has been presented to the user, which item has been seen by the user, which exercise has been tackled and solved or not solved. The Updater (see Figure 1) receives event information on the users’ actions from the learning environment, and transforms every user event into one or more corresponding database tables. Usually, the Updater receives the information online from the event queue, but it can also read in files with log data that have been generated in an offline mode. The Log Database (see Figure 1) is at the center of the SIAM system. It contains not only representations of the raw data in the user logs (see Table 1), but also has tables that hold the results of the analysis. Moreover, it contains tables for additional background knowledge concerning the users, context, or courses among others (see Table 2). The basic level of the database, which corresponds to the raw log data, is organized in tables that represent generic event information as well as eventspecific data. The structure of these tables has been designed closely to the events specification, since this allows for simpler updating operations when the event subsystem is changed or replaced by another system. Table 1 lists the basic event tables together with their fields and a short description. In addition, as shown in Table 2, the database includes tables that hold additional information on the users and sessions such as gender and holiday periods, respectively, as well as tables that are derived from these by means of database queries. In addition, the Updater provides some data completion functionality. Every now and then for some tables the information is not complete. For instance, most users do not log out of the learning system explicitly by using the button provided in the user interface, but simply close the browser or
Results from Action Analysis in an Interactive Learning Environment
39
Table 1 Log Database Schema for Basic Level Table Event
User logs into the system; start of a new session User logs out of the system; end of a session System presents a requested book page to the user
User starts an exercise User submits an input for an exercise Exercise is finished
System changes the mastery level of an user
Presentation of a learning item to the user
Indicates changes of some user meta data
Gives a more fine-grained resolution (on item level) of the user's focus (uses information of an eye-tracker)
shut down their computer. In this case usually no event is generated concerning the logout. The corresponding logout table is enhanced by information that is derived from the other events the user created and on heuristics concerning pauses and open hours among others. This information is automatically added to the login table, but is marked as derived information to be distinguishable from the original log data.
40
Scheuer, Mühlenbrock, and Melis
Since log files can grow very large, a tool for a realistic usage (rather than for small-scale academic purposes only) needs to built on a database. For us this was a major design decision but only few research prototypes developed so far take this need into consideration.
Action Analysis Component The Analyzer (see Figure 1) performs data aggregation and evaluation in terms of the queries to the log database. It also incorporates a number of machine learning methods for automatically analyzing the data and takes the data from the Log Database as an input. If needed, adjustments and preferences can be input by the engineer who is running the analysis. In addition of getting a better insight into the underlying relationships in the data, the results of the analysis can be used for the prediction and classification of future sessions. Up to now, the Analyzer is not an integrated, fully automated component but consists of a set of SQL-query scripts that preprocess the raw action data, and the machine learning tool box YALE (Fischer et al., 2002) to execute the actual machine learning analysis. YALE offers a wide range of operators for performing data pre-processing (discretisation, filtering, normalisation and others), learning (more than 100 learning algorithms), validation (e.g., cross-validation, several performance evaluation criteria) and other analysis-relevant tasks. In the following, a selection of learning schemes used in the Action Analysis Component is presented: Decision tree learner: Many machine learning methods provide their output in an intelligible, human readable form. For instance, methods for generating decision tress from data, such as C4.5 (Quinlan, 1993), allow for a treeshaped representation of the learning results. A decision tree is constructed by the algorithm first selecting an attribute to place at the root node of the tree and make one branch for each possible value. This splits up the example set into subsets, one for every value of the attribute. The attribute is selected in a way that maximizes the information gain by the chosen attribute. This process is repeated recursively for each branch, using only those instances that actually reach the branch. If at any time all instances at a node have the same classification, the developing of that part of the tree is stopped. Rule Learner: Rule learning algorithms generate a set of prediction rules each consisting of a class values and its condition. The PRISM rule learner, for instance, constructs rules in the following way: After choosing a class value the algorithm starts with an empty rule and adds iteratively new conditions of the form ‘attribute X has value Y.’ Each added condition narrows the scope of the rule because less trainings instances will match the extended condition. On the other hand, the attribute-value pairs are chosen in a way which increases the accuracy of the rule (a higher share of the matched instances will be correctly classified). If the rule is perfect (100% accuracy) the algorithms goes over to construct the next rule. To cover all trainings
Results from Action Analysis in an Interactive Learning Environment
41
instances, several rules for one class value might be necessary. This procedure is repeated for all class values. Naïve Bayes: Naïve Bayes is a widely-used probabilistic classifier, based on Bayes’ Theorem. It assumes that all attributes are independent (therefore naive). Dependencies between attributes might comprise the quality of computed models. Support Vector Machine (SVM): Support Vector Machine algorithms try to find hyperplanes in the attribute space which optimally separate instances of different classes. Optimality is achieved by maximising the margins between classes (the margin is the minimum distance between class instances and the separating hyperplane). Because more complex patterns can not be captured with a linear separation, SVM apply the kernel trick: instead of changing the SVM core algorithm, the attribute space is transformed by defining a non-linear kernel. Boosting algorithms: Boosting algorithms (e.g., AdaBoost) are metaalgorithms which build on so-called weak learners that are rather simple learning methods, for instance a Decision Stump algorithm (computes a 1level decision tree). The boosting algorithm iteratively applies the weak learner to the training instances to compute a set of predictive models. Initially, all instances are equally weighted. In each iteration, weights of the falsely classified instances gets increased and of correctly classified ones decreased. In the subsequent iteration the learner will try to find a model which takes the falsely classified ones more into account. The complete model of the whole boosting procedure combines all the computed simple models, weighted by their accuracy. Problems and Solutions Although the SIAM system presented here considerably facilitates the process of action log analysis there are still some open questions and problem to be tackled which will have to be addressed in future. Up to now, pre-processing is executed in a semi-automated way: before starting the analysis the existing SQL scripts are adjusted according to the purpose of the current investigation and then applied by the Analyzer component to the raw data. This indicates further potential for automation: it has to be found out which usage aspects are generally relevant to the analysis process, independent of specific purposes of individual investigations. For example, students’ success rates, the number of pages read or their overall online time will in most investigations play a role. The computation of such aspects could be completely automated. The computation of aspects which are specific to individual investigations will still need manual adjustments. A further candidate for automation is the inclusion of external, nonbehavioural data which are not contained in the action logs (e.g., gender, questionnaire data, pre- and post test results). Currently, this essential infor-
42
Scheuer, Mühlenbrock, and Melis
mation is manually added to the database, from spreadsheets provided by teachers. It has to be investigated which interfaces can be offered to teachers to automatically feed the database. There is ongoing work concerning the inclusion of pedagogical and domain data. State-of-the art e-learning systems use learning objects which are annotated with pedagogical or domain-specific metadata. Including this information will allow more sophisticated analyses. For example, performances can be analyses for different topics or different levels of difficulty.
Related Work Merceron and Yacef (2005) present a case study how educational data can be mined to support teachers to understand their students' learning and students to promote self-reflection. The analysis was based on data of the Logic-ITA system, a web-based tutoring tool to practise logical proofs. The conducted analyses included association rule algorithm that can be used to identify sets of mistakes that often occur together, or a decision tree algorithm to predict final exam marks given students prior usage behaviour. They used their tool for advanced data analysis in education (TADA-Ed) (Benchaffai, Debord, Merceron, & Yacef, 2004) to carry out the analyses. There are a number of tools which are more concerned with an appropriate presentation of student action data than with an automated analysis. For instance, The LISTEN Reading Tutor is intended to help children learning to read. The system displays stories, sentence by sentence, which children then have to read aloud. Children's utterances as well as other student-tutor interaction are logged into a relational database. Mostow et al. (2004) present a tool to browse the student-tutor interaction data of the LISTEN Reading Tutor which are presented in a hierarchical tree structure. The Data Recording and Usage Interaction Analysis in Asynchronous Discussions System (D.I.A.S.) presented by Bratitsis and Dimitracopoulou (2005) is intended to improve asynchronous discussions. The usage data is stored in a database system. From the raw data, meaningful statistics (so-called interaction analysis indicators) which give account of passive and active participation, and thread initiations, are extracted by means of SQL queries. These indicators can be presented to discussion participants, but also to monitoring persons as teachers by a visualisation module. Discussion participants may benefit from an increased awareness of their own actions (metacognition) as well as those of other participants. Teachers can identify problems and, if necessary, intervene. Using Action Analysis The Action Analysis System has been investigated in a mathematics courses in a secondary school and at a UK-based University. The objectives of those tests have been (1) to collect experience with the SIAM system and to find possible ways to improve it, (2) to conduct the actual analyses to get some insight
Results from Action Analysis in an Interactive Learning Environment
43
how students use the ActiveMath and to uncover relationships between individual (e.g., gender and cognitive style), behavioural and external factors (teacher).
Estimating Performance and Gender The SIAM system has been tested in a medium sized experiment in a mathematics course in a secondary school. About 70 students from three different courses used the learning environment for a period of five months. A further course of about 20 students were taught the same subject but in the traditional classroom manner. The other three courses used the ActiveMath learning environment on a weekly basis in two-hour lessons. During the online course, each class was split into two subgroups using different computer rooms. Many students were already familiar with computers, but a considerable number needed further instruction even for basic operations such as login. A preliminary evaluation of the logged data after a first couple of sessions showed some problems in the quality of the data. For instance, instead of registering with the ActiveMath system only in the very first session and using the created user account in the sequel, a large number of students created a new account including a new user name for each session, which makes difficult the longitudinal analysis of the data. The problem was resolved by having the students create only one account and making the registration procedure inaccessible for them after that. Figure 3 provides an overview on the data that shows the number of events related to the hours of the day. Clearly, the major amount of events was created during lesson hours between 9 am and 2 pm (14 h), but some events were
Figure 3. Number of user actions in relation to hours of the day
44
Scheuer, Mühlenbrock, and Melis
generated earlier or later in the day. Some of them are due to a small number of students using the system in off-time, and most of them are due to teachers and system administrators preparing or evaluating the system. At the end of the term, a written post test has been done with the students to assess what they had learnt. The results were added to the database manually as well as some further information on gender, teacher, and so forth. Further information was semi-automatically generated by the Analyzer from the log data and added to the database. For each student the information in Table 2 has been gathered for further analysis. For anonymity reasons the students used arbitrary user names in the learning environments, and they were to give these user names also in the post test. However, in one course the students put down their real names on the test sheets, a fact which makes the linking to their log data impossible. Finally, 25 student records were complete and clean enough for being used in the further analysis. Figures 4 and 5 show the decision trees that were generated for characterizing the attributes post-test (with values low, medium, and high) and gender (with values male and female), respectively. The decision tree for post test (see figure 4) shows that for predicting the result the teacher is most influential, that is, with teachers Mr. E. and Mr. G. the post test result is expected to be low. However, with Mr. K. the test result is high if the student is in class 6a, would be low if he was in class 6b and male, and medium if she was in class 6b and female. Similarly, the decision tree for gender (see Figure 5) depicts that when the post test result is bad or medium the user would be male Table 2 Data From Manual Input and Gathered Automatically Attribute
Input
Gender Class Teacher
manual manual manual
post test Exercises started Exercises finished successful exercises reading actions solving actions Dictionary Integration off time
Comment Course, each comprised of about 20 students Each class has been split into two subgroups with each being taught by another teacher Results in the post test done in writing
Whether the student used the dictionary for lookup Whether the student is handicapped Whether the student accessed the learning environment beyond lesson hours
Results from Action Analysis in an Interactive Learning Environment
Low Medium High
Low
Medium
High
11 2
1 3 2
1 5
45
Teacher
Mr. E. low
Mr. G.
Mr. K. low
Class 6a
6b
high
Gender
male low
female medium
Figure 4. Decision tree for post test and corresponding confusion matrix
or female, respectively, and if it was high the reading activity would indicate a male use if it was low and a female user if it was medium or high. This is an interesting results that can be interpreted as indicating that female users more successful in tests and a harder working concerning reading material. Importantly, these attributes have solely been selected by the learning algorithm from the ones in Table 2, which means that other attributes such as successful exercise solving and off time system usage were of minor relevance. Figures 4 and 5 also give the quality of the decision trees in terms of confusion matrices. A confusion matrix displays the result of testing the decision tree with data as a two-dimensional matrix with a row and a column for each class. Each matrix element shows the number of test examples for which the actual class is the row and the predicted class is the column. Good results correspond to large numbers down the main diagonal elements and small, ideally zero, off-diagonal elements. Hence the confusion matrices in Figures 4 and 5 indicate not ideal, but quite good decision trees.
46
Scheuer, Mühlenbrock, and Melis
Male Female
Male
Female
11 2
1 3
Post test
low male
medium female
low
high
Reading
medium
male
female
high female
Figure 5. Decision tree for gender and corresponding confusion matrix CONCLUSION
In this study, a decision trees were computed capable of predicting gender and performance. These analyses indicate possible usages of the SIAM system albeit, due to the small sample size, the expressiveness and generality of the presented results are limited. One reason for this fact is the high number of instances which had to be excluded from the analysis. This underlines the importance of a careful preparation. Estimating Cognitive Styles A further study was conducted in the context of an introductory mathematics course for computer science students at a UK-based University with around 300 participating students. Its main objective was to investigate the relationship between students’ cognitive styles and their observable behaviour in the ActiveMath system by means of machine learning.
Introduction: Cognitive styles Cognitive styles describe an “individual's preferred and habitual approach to organising and representing information” (Riding & Rayner, 1998). Among the considerable number of existing style theories, especial-
Results from Action Analysis in an Interactive Learning Environment
47
ly Witten’s idea of field-dependency (Witten & Frank, 1999) attracted attention in the scientific community and was subject to numerous studies. Fielddependent individuals are attributed with the tendency to rely on an external frame of reference (the field) when tackling a task whereas field-independent ones make more use of an internal mental model. As an implication for instruction, field-dependent students may benefit to a larger extent from external help and guidance whereas field-independent ones can take advantage from instructional settings which allow a higher degree of freedom. The advent of hypertext gave rise to a new strand of research which investigated how subjects with different cognitive styles can cope with differently structured environments. Hypertext environments provide a nearly ideal testbed for examining these relationships: field-dependent and fieldindependent subjects may show differences in navigation behaviour and tool use, and their performance may differ depending on the characteristics of the underlying hypertext system (linear vs. non-linear structure, offered tools, offered navigational support). Chen and Macredie (2002) give a comprehensive overview of empirical studies concerning the relationship between field-dependency and hypermedia navigation and their findings. What can be the benefit of knowing students’ cognitive style? There are a number of studies which address this question by comparing scores of subjects performing a task in an environment which corresponds to their cognitive style (matching condition) with scores of subjects in a non-matching environment (Ford & Chen, 2001; Witten, 1999). The results essentially support the hypothesis that matching students’ style does have a positive effect on performances. Therefore, cognitive styles gained some attraction in the adaptive hypermedia community: tailoring content structuring, presentation style and navigation support to individuals’ cognitive style may be a promising approach to improve usability, and in the area of hypertext learning, to increase students’ learning gains. Traditionally, cognitive styles are determined by means of questionnaires and psychological tests. A machine learning model could directly derive styles from behavioural data and would make the application of additional instruments dispensable. Pre-study: Hypotheses Building The quality of a machine learning model is strongly dependent on the set of attributes it is generated from. For the purposes of this study, attributes should reflect behavioural aspects which are relevant for the decision between field-independence and field-dependence, that is, attributes in which field-independent subjects significantly differ from field-dependent ones. Therefore, a pre-study was conducted to review empirical findings reported in the relevant literature. We found evidence that the following aspects are linked with field-dependence: Linearity of navigation, revisita-
48
Scheuer, Mühlenbrock, and Melis
tion behaviour, material coverage, pace, home page visiting behaviour, use of hyperlinks, exercise behaviour and tool usage. The analysis will show if these results can be replicated or not. Experimental Setting Around 300 students of a UK-based University participated in an introductory course in mathematics for computer scientists. For one part of the course, the instruction was based on the ActiveMath system. The computermediated contents covered the topics functions, matrices and graphs. ActiveMath was partly used in supervised tutorial sessions, partly on students’ own discretion. The supervised sessions covered four tutorial weeks: week 1 was mainly devoted to get to know the system, week 2 and 3 to learn the material, and week 4 to revise for exam preparation. Additionally, students were asked to take part in the Cognitive Style Analysis (CSA), a computerised test to determine their cognitive styles. Low test scores correspond to a field-dependent (FD) cognitive style, high scores to a field-independent (FI) cognitive style. ANALYSIS AND RESULTS
Based on the results of the pre-study, a set of attributes, describing students’ behaviour in the ActiveMath system, was computed for each supervised session (see Table 3, in parentheses the hypotheses they refer to). These data, combined with the CSA results of the respective students, constituted the input for the machine learning algorithms. In a first processing step, inappropriate data was filtered out (students not taking part in the CSA test, outliers in terms of extremely short online time (< 20 minutes), small number of pages (< 3), small number of exercises finished (<3)). Several machine learning schemes were applied in order to compute a predictive model (Naïve Bayes, C4.5 Decision Tree, Support Vector Machine (SVM), PRISM Rule Learner, AdaBoost with Decision Stump). Because no satisfactory result could be achieved (all models were outperformed by a simple choice of the most frequent class with 43 % accuracy), a series of variations in the machine learning setup were conducted (e.g., analysing every tutorial week alone, several other preprocessing steps, variation of parameters of machine learning algorithms). The varied setups did not lead to better performances. Pearson’s correlation coefficient r was computed to check whether in the pre-study hypothesised relationships between field-dependence and behaviour can be confirmed. Also the application of Pearson’s correlation coefficient identified only three significant relationships (p<0.05), namely positive correlation between field-independence and the notes creation, notes access and
Results from Action Analysis in an Interactive Learning Environment
49
Table 3 Attributes Extracted From Students’ Action Log Traces Attribute
Description
tocRate
ActiveMath offers two means for navigating through contents: Students can either select pages directly in the TOC panel, or use the previous and next buttons. This attribute gives the ratio between TOC and previous-next moves. (Linearity) A graph-theoretical measure indicating the linearity of a student's navigation graph. A detailed description of its computation and the rationale behind is provided by McEneaney (2001). (Linearity) The non-linear counterpart to the stratum measure. It quantifies the degree of connectedness of the student's navigation graph (see also McEneaney, 2001). (Linearity) The average number of times the student returns to a page already seen. (Revisitations) Number of home page visits divided by the number of book page requests. (Home page) ActiveMath offers a 'Notes' function for annotating learning items. These notes can also be made public and are the main communication vehicle in ActiveMath. The attribute is given relative to the number of learning items seen. (Tool usage) Amount of accessed notes relative to the number of learning items seen. (Tool usage) Number of search queries. (Tool usage) Key terms occurring in the content material are hyperlinked. Clicking these hyperlinks opens a window containing additional information, for instance a concept definition. The attribute gives the number of clicked key terms relative to the number of pages visited. (Hyperlinks) Average number of attempts needed to solve an exercise (aborted exercises are excluded). This attribute gives at the same time account of the exercise performance. (Exercises) Exercises can be aborted by closing the exercise window. This attribute gives the relative amount of non-aborted exercises. (Exercises) Number of exercises tackled relative to the number of exercises seen. (Material coverage) Reading material and exercises were located on different pages. This attribute gives the absolute number of reading material pages visited. (Material coverage) Time spent on exercise pages divided by the number of exercise steps. (Pace) Typically, the first attempt needs more time than the subsequent attempts. This attribute gives the median time between exercise start and the first submitted answer. (Pace) Median time per book page (only reading material). (Pace)
exercise tackling rate. Also, these results cannot be regarded as convincing because they are not constant over time (only observable in one of four weeks).
50
Scheuer, Mühlenbrock, and Melis
DISCUSSION
The presented study investigated the relationship between students' behaviour in a hypertext-based learning system and their field-dependence cognitive style. The behavioural aspects under consideration were derived from findings reported in the relevant literature. Several machine learning schemes were applied and Pearson’s correlation coefficient was computed but none of them were capable to replicate findings of prior studies. The possible reasons are discussed in the sequel. In this study Riding’s CSA test was used to measure students’ fielddependence. Most of the findings presented in the pre-study are based on the instruments Embedded Figures Test (EFT) or Group Embedded Figures Test (GEFT). It is not clear in how far findings based on the classical tests EFT and GEFT can be applied to subjects classified according to CSA scores. Zang and Noyes (2006) report on two studies: in the first one, only moderate correlations between CSA and GEFT results could be observed, in the second one neither EFT nor GEFT scores were correlated with CSA scores. It is not very surprising that no strong correlations are discovered because one motivation for developing the CSA instrument was to overcome methodological flaws of EFT and GEFT. On the other hand, also the results from CSA-based studies could not be replicated. It might be possible that other individual differences not considered in the analysis also play an important role. There is some evidence that prior knowledge (Last, O'Donnell, & Kelly, 2001), gender (Ford & Chen, 2000), cultural background (Kralisch & Berendt, 2004) and computer experience (Ford & Chen, 2000) also affect hypertext navigation, tool use and performance. The omission of some of these factors might prevent a reliable diagnosis. The influence of computer experience in combination with the field-dependence style on search and navigational behaviour was investigated in a study by Kim (2000). Significant differences between FD and FI subjects could only be observed for inexperienced subjects. Due to the fact that all participants of the here presented study took courses in computer science we can assume rather experienced subjects where only marginal behavioural differences exist. A further reason might be a lack of opportunities to display different behaviour. Exercises consisted only of one step and were largely solved without problems (74% of all exercises were solved in the first of three possible attempts). More complex exercises, possibly with a help or a hint button, would have allowed a deeper analysis of problem solving behaviour. The contents were adapted from a paper script with the consequence of a predominantly linear structure with only few hypertext features. This structural restrictedness of the hypertext environment might have led to similar navigation strategies of FDs and FIs. In a study presented by Ford & Chen (2000) FD subjects made greater use a topic maps for navigating than FI
Results from Action Analysis in an Interactive Learning Environment
51
ones, presumably to capitalise on structural information provided there. This suggests that FD subjects will also use ActiveMath’s table of contents (TOC) to navigate. On the other hand, the TOC is the only navigation control allowing non-linear navigation. Hence both, FD and FI subjects, might appreciate the TOC, albeit, due to different reasons. A look into the data supports this assumption: Pearson's correlation coefficient between CSA score and TOC rate (r =.00), and between CSA score and stratum (r=.03) indicate the absence of such straight connections in the analysed sample. A further possible problem is linked with the data logging process. Webbrowsers offer back and forward buttons which allow movement between already visited pages. These pages are usually not requested from the webserver but retrieved from the local browser cache. Because the server side is not involved in this process, these navigational moves are not available to the logging component and are missing in the action traces. This might lead to complications because back-button usage is one of the most prominent navigation actions. In a study by Tauscher & Greenberg (1997) the back button accounted for 30% of all navigation actions. Not only that a possible back button attribute might contribute to a better result, the omission of back button information also distorts the measurements stratum, compactness, recurrence rate, home page rate, time per page and time per exercise attempt. CONCLUSION
In this study, the problem of automatically diagnosing students’ fielddependence style by means of machine learning techniques is addressed. For the application of styles in adaptive hypermedia systems, a reliable diagnosis is fundamental because no adaptation is still better than a false adaptation. A set of behavioural attributes was defined, based on promising empirical findings reported in literature. The applied techniques turned out not to be capable of deriving a reliable predictive model. A subsequent analysis using Pearson’s correlation coefficient could also not confirm the results of prior studies. Several potential reasons thereof were discussed. Especially, general concerns arise whether cognitive styles can be diagnosed from a certain behaviour which is undoubtedly the product of a multitude of individual factors. Summary In this article, the SIAM system is presented for the automatic analysis of user actions in web-based learning environments. It has been used in investigations with students from an introductory mathematics course to explore user behaviour in ActiveMath in relation to non-observable variables. The relationships between cognitive style and students’ behaviour found in prior studies could not be replicated.
52
Scheuer, Mühlenbrock, and Melis
A decision tree has been generated that reveals some interesting underlying aspects in the log data, which can be used for prediction with new users of the system. References Arroyo, I., Murray, T., & Woolf, B. P. (2004). Inferring unobservable learning variables from students’ help seeking behaviour. In Proceedings of the workshop Analyzing Student-Tutor Interaction Logs to Improve Educational Outcomes at the International Conference on Intelligent Tutoring Systems ITS 2004, (pp. 29-38), Maceio, Brasil, August. Benchaffai, M., Debord, G., Merceron, A., & Yacef, K. (2004). TADA-Ed, a tool to visualize and mine students' online work. In Collis, B. (ed.), Proceedings of the International Conference on Computers in Education ICCE 2004, (pp. 1891-1897), Melbourne, Australia. Bratitsis, T., & Dimitracopoulou, A. (2005). Data recording and usage interaction analysis in asynchronous discussions: The D.I.A.S. system. In Proceedings of the workshop on Usage Analysis in Learning Systems at the 12th International Conference on Artificial Intelligence in Education AIED 2005, (pp. 17-24), Amsterdam, The Netherlands, July. Chen, S. Y., & Macredie, R. D. (2002). Cognitive styles and hypermedia navigation: Development of a learning model. Journal of the American Society Information Science and Technology, 53(1), 3-15. Fischer, S., Klinkenberg, R., Mierswa, I., & Ritthoff, O. (2002). Yale: Yet another learning environment - Tutorial. Technical Report No. CI-136/02, Collaborative Research Center 531, University of Dortmund. Ford, N., & Chen, S.Y. (2000). Individual differences, hypermedia navigation and learning: An empirical study. Journal of Educational Multimedia and Hypermedia, 9(4), 281-311. Ford, N., & Chen, S.Y. (2001). Matching/mismatching revisited: An empirical study of learning and teaching styles. British Journal of Educational Technology. 32(1), 5–22. Heiner, C., Beck, J., & Mostow, J. (2004). Lessons on using ITS data to answer educational research questions. In Proceedings of the workshop Analyzing Student-Tutor Interaction Logs to Improve Educational Outcomes at the International Conference on Intelligent Tutoring Systems ITS 2004, (pp. 1-9), Maceio, Brasil, August. Kim, K. S. (2000). Effects of cognitive style on web search and navigation. In Proceedings of the Conference on Educational Multimedia, Hypermedia & Telecommunications ED-MEDIA 2000, (pp. 496-501), Montreal, June. Kralisch, A., & Berendt, B. (2004). Cultural determinants of search behaviour on websites. In Proceedings of the Sixth International Workshop on Internationalisation of Products and Systems: Designing for Global Markets, (pp. 61-74), Vancouver. Last, D. A., O'Donnell, A. M., & Kelly, A. E. (2001). The effects of prior knowledge and goal strength on the use of hypermedia. Journal of Educational Multimedia and Hypermedia, 10(1), 3-25. McEneaney, J. E. (2001). Graphic and numerical methods to assess navigation in hypertext. International Journal of Human-Computer Studies, 55, 2001, 761-786. Melis, E., Büdenbender, J., Andres, E., Frischauf, A., Goguadze, G., Libbrecht, P., Pollet, M., & Ullrich, C (2001). ActiveMath: A generic and adaptive web-based learning environment. International Journal of Artificial Intelligence in Education, 12(4), 385-407.
Results from Action Analysis in an Interactive Learning Environment
53
Melis, E., Goguadze, G., Homik, M., Libbrecht, P., Ullrich, C., & Winterstein, S. (2006). Semanticaware components and services of ActiveMath. British Journal of Educational Technology, 37(3), 405-423. Merceron, A., & Yacef, K. (2003). A web-based tutoring tool with mining facilities to improve learning and teaching. In Proceedings of the 11th International Conference on Artificial Intelligence in Education AIED 2003, (pp. 201-208), IOS Press. Merceron, A., Oliveira, C., Scholl, M., & Ullrich, C. (2004). Mining for content re-use and exchange - Solutions and problems. In Poster Proceedings of the 3rd International Semantic Web Conference ISWC 2004, (pp. 39-40), November. Merceron, A. & Yacef, K. (2005). Educational data mining: A case study. In Proceedings of the 12th International Conference on Artificial Intelligence in Education AIED 2005, (pp. 467474), Amsterdam, The Netherlands, IOS Press. Mostow, J. (2004). Some useful design tactics for mining ITS data. In Proceedings of the workshop Analyzing Student-Tutor Interaction Logs to Improve Educational Outcomes at the International Conference on Intelligent Tutoring Systems ITS 2004, (pp. 20-28), Maceio, Brasil, August. Mühlenbrock, M. (2005). Automatic action analysis in an interactive learning environment. In Proceedings of the workshop on Usage Analysis in Learning Systems at the 12th International Conference on Artificial Intelligence in Education AIED 2005, (pp. 73-80), Amsterdam, The Netherlands, July. Oliveira, C., & Domingues, M. (2003). Data warehouse for strategic management of an elearning system. In Proceedings of the 2nd International Conference on Multimedia and ICT in Education m-ICTE 2003, (pp. 627-631), Badajoz, Spain, December. Quinlan, R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Mateo, CA. Riding, R., & Rayner, S. (1998). Cognitive styles and learning strategies. David Fulton Publishers, London. Tauscher, L., & Greenberg, S. (1997). Revisitation patterns in World Wide Web navigation. In Proceedings of the Conference on Human Factors CHI 1997, (pp. 399-406), ACM Press. Witten, I. H., & Frank, E. (1999). Data mining. Morgan Kaufmann, San Francisco. Zhang, J., & Lu, J. (2001). An educational data mining prototype. In Proceedings of the 10th International Conference on Artificial Intelligence in Education AIED 2001, (pp. 616-618), IOS Press. Zhang, M., & Noyes, J. (2006). Riding’s cognitive style modal: A stable construct? In Proceedings of the 11th annual conference of the European Learning Styles Information Network ELSIN. University of Oslo, Norway.
Acknowledgement This publication was funded by the iClass project in the 6th Framework Programme of the European Union (Contract IST-2003-507922). The author is solely responsible for its content. The data used in the second investigation was collected in a study designed, led, and conducted by Maria Margeti as part of her doctoral work at the Institute of Education, University of London.
Usage Analysis in Learning Systems, 55-78
Towards Live Informing and Automatic Analyzing of Student Learning: Reporting in ASSISTment System MINGYU FENG Worcester Polytechnic Institute, USA [email protected] NEIL T. HEFFERNAN Worcester Polytechnic Institute, USA [email protected] Limited classroom time available in middle school mathematics classes forces teachers to choose between assisting students’ development and assessing students’ abilities. To help teachers make better use of their time, we are integrating assistance and assessment by utilizing a web-based system, ASSISTment, that will offer instruction to students while providing a more detailed evaluation of their abilities to the teacher than is possible under current approaches refer to (Razzaq et al., 2005) for more details about the ASSISTment system). In this article we describe the types of reports that we have designed and implemented that provide real time reporting to teachers in their classrooms. And experiment analysis tools are available to facilitate researchers to carry out randomized controlled learning experiments so that they are able to compare different tutoring strategies. Additionally, reports to principals are in progress. This reporting system is robust enough to support the 2000 students currently using our system.
Introduction Given the limited classroom time available in mathematics classes, teachers are compelled to choose between time spent assisting students’ development and time spent assessing students abilities. To help resolve this dilemma, assistance and assessment are integrated in a web-based system called the ASSISTment system (Razzaq et al., 2005) that will offer instruction to
56
Feng and Heffernan
students while providing a more detailed evaluation of their abilities to the teachers than is possible under current approaches. The plan is for students to work on the ASSISTment website for about 20 minutes per week. Every time when students work in the system, the system learns more about their abilities. Students’ performance is tracked by the reporting system which will provide live online reports to inform teachers about students’ learning results. The Massachusetts Comprehensive Assessment System (MCAS) MCAS is a high-stakes testing system required by the No Child Left Behind Act. In Massachusetts, MCAS is the graduation requirement in which all students in-state educated with public funds in the tested grades are required to participate. It is administered as a standardized test that produces rigorous tests in English, math, science and social studies for grades 3 to 10 every year. Students need to pass the math and English portions of the 10th grade versions in order to get a high school diploma. Because students are more likely to fail the mathematics portion of the test, the state is focusing efforts on mathematics. The state of Massachusetts has singled out student performance on the 8th grade math test as an area of highest need for improvement (see http://www.doe.mass.edu/mcas/2002/results/summary.pdf). Therefore, 8th grade math became where the ASSISTment project started to help students get better prepared for the MCAS. In Massachusetts, the state department of education has released eight years worth of 8th grade MCAS test items on math, totalling over 300 items, which we have turned into assistments by adding tutoring. An example of the MCAS test item can be seen in Figure 1 (without the break-down questions). Background on the ASSISTment System The ASSISTment system is an e-learning and e-assessing system that is about 2.5 years old. In the 2004-2005 school year some 600+ students used the system about every two weeks. Eight math teachers from two schools would bring their students to the computer lab, at which time students would be presented with randomly selected MCAS test items. If students got the item correct they were given a new one. If they got it wrong, they were provided with a small tutoring session where they were forced to answer a few questions that broke the problem down into steps. The key feature of assistments is that they provide instructional assistance in the process of assessing students. Razzaq et al. (2005) addressed the learning outcome of the system and some evidence was shown that the students were learning due to the instructional assistance within the system. Though learning has been one of the focus points of our research, detailed discussion of the learning effect is beyond the scope of this article. Each assistment consists of an original question and a list of scaffolding questions. An assistment that was built for item 19 of the 2003 MCAS is shown in Figure 1. In particular, Figure 1 shows the state of the interface
Towards Live Informing and Automatic Analyzing of Student Learning
Figure 1. An ASSISTment shown while a student is working on an item, showing two scaffolding questions, one error message, and a hint message that can occur at different points.
57
58
Feng and Heffernan
when the student is partly done with the problem. The first scaffolding question appears only if the student gets the item wrong. We see that the student typed “23” (which happened to be the most common wrong answer for this item from the data collected). After an error, students are not allowed to try the item further, but instead must then answer a sequence of scaffolding questions (or scaffolds) presented one at a time. Students work through the scaffolding questions, possibly with hints, until they eventually get the problem correct. If the student presses the hint button while on the first scaffold, the first hint is displayed, which would be the definition of congruence in this example. If the student hits the hint button again, the second hint appears which describes how to apply congruence to this problem. If the student asks for another hint, the answer is given. Once the student gets the first scaffolding question correct (by typing “AC”), the second scaffolding question appears. Error messages will show up if the student types in a wrong answer as expected by the author. Figure 1 shows any error messages that appeared after the student clicked on “1/2*x*(2x)” suggesting he might be thinking about area. Once the student gets this question correct he will be asked to solve 2x+x+8=23 for 5, which is a scaffolding question that is focused on equation-solving. So if a student got the original item wrong, what skills should be blamed? This example is meant to show that the ASSISTment system has a better chance of showing the utility of fine-grained skill modeling due to the fact that we can ask scaffolding questions that will be able to tell if the student got the item wrong because they did not know congruence versus not knowing perimeter, versus not being able to set up and solve the equation. Most questions’ answer fields have been converted to text entry style from the multiple choice style they originally appear as in the MCAS tests. As a matter of logging if the student got an original question right or wrong, the student is only marked as getting the item correct if they answered the questions correctly before asking for any hints or encountering scaffolding. At present, we are focused on 8th grade mathematics and certain amount of content (about 50 assistments, 2 hours’ work) for 10th grade mathematics has been released. Though, we believe the system is flexible enough to be used to build tutors for other subjects, such as English, and physics. Our supporting website, www.assistment.org, has been running for two and a half years, providing more than 400 assistments built using our online authoring tools (Turner, Macasek, Nuzzo-Jones, Heffernan, & Koedinger, 2005; Heffernan et al., 2006) and over 2000 students from more than 20 teachers from 5 schools were using the system every two weeks during the school year of 2005 to 2006. Why Do We Need a New Reporting System Beyond MCAS Reports? Schools seek to use the yearly MCAS assessments in a data-driven manner to provide regular and ongoing feedback to teachers and students on
Towards Live Informing and Automatic Analyzing of Student Learning
59
progress towards instructional objectives. But teachers do not want to wait six months for the state to grade the exams. Teachers and parents also want better feedback than they currently receive. While the number of mathematics skills and concepts that a student needs to acquire is on the order of hundreds, the feedback on the MCAS is broken down into only five mathematical categories, known as strands. However, a detailed analysis of state tests in Texas (Confrey, Valenzuela & Ortiz, 2002) concluded that such topic reporting is not reliable because items are not equated for difficulty within these areas. To get some intuition on why this is the case, the reader is encouraged to try the item shown in Figure 1. Then ask yourself, “What is the most important thing that makes this item difficult?” Clearly, this item includes elements from four of the five strands: Algebra, Geometry (congruence), Number Sense (arithmetic operations) and Measurement (perimeter). Ignoring this obvious overlap, the state chose just one strand, Geometry, to classify the item, which might also be the first feeling of most people. However, as we will show below, we have found evidence there is more to this problem. The question of tagging items to learning standards is very important because teachers, principals and superintendents are all being told to be data-driven and use the MCAS reports to adjust their instruction. As a teacher has said, “It does affect reports... because then the state sends reports that say that your kids got this problem wrong so they’re bad in geometry – and you have no idea, well you don’t know what it really is, whether it’s algebra, measurement, or geometry.” There are several reasons for this poor MCAS reporting: 1) the reasonable desire to give problems tap-multiple knowledge components (knowledge component is the way we refer to strand or skill in our system), 2) the fact that paper and pencil tests cannot figure out, given a student’s response, what knowledge components to credit or blame, 3) there are knowledge components that deal with decomposing and recomposing multi-step problems, yet are currently poorly understood by cognitive science. So a teacher cannot trust that putting more effort on a low scoring area will indeed pay off in the next round of testing. The reporting in the ASSISTment system was built to identify the difficulties individual students – and the class as a whole – are having. It is intended that teachers will be able to use the detailed feedback to tailor their instruction to focus on the particular difficulties identified by the system. Compared to the MCAS reports, reports provided by the reporting in the ASSISTment system is live so that teachers do not need to wait. We have built more multi-mapping models that allow one problem to be tagged with multiple knowledge components and finer grained models that break down the five strands into about 100 knowledge components and code the problems (also the scaffolding questions) with the new knowledge components. Moreover, the reporting system provides more performance analyzing tools
60
Feng and Heffernan
for teachers and school principals to make comparison among different groups and to run learning experiments. The remainder of the article is organized as follows. The Related Work section discusses related work. The Data Source section discusses the data source we used in our reporting system. Then in the Transfer Model section, we introduce the transfer models we have built and related work that has been done using the new transfer models. Different reports for teachers will be shown in Reporting Systems for Teachers; we provide teachers’ feedback at the end of section, too. Reports for principals are discussed in Reporting for Principals and Related Results. And the experiment analysing tools are discussed in Reporting as Learning Experiment Tools for Researchers. Related Work Many researchers have been interested in constructing assessment/tutoring systems on different subjects, many of which provide the similar tutoring functionality as the ASSISTment system and various reports to teachers to help instructing student learning. Measures of Academic Progress (MAP - http://www.nwea.org) are statealigned computerized adaptive tests provided by the Northwest Evaluation Association (NWEA) and it is also the most commonly used assessment system by Worcester Public Schools. MAP covers subjects other than math and gives similar online reports such as class rosters, student progress reports, and class by subject reports to educators to guide their instructions. Unlike the ASSISTment system, as an assessment system, MAP provides no tutoring to help student learning and it sticks to the strands and categorization given by the state. Therefore, it lacks the ability to analyze a problem in further detail. The Online Learning Initiative (OLI - http://www.cmu.edu/oli/) from Carnegie Mellon University provides a collection of online tutors directed at many subject areas. While the OLI provides a wide range of online tutors, the tutors lack extensibility to other tutor types and domains, resulting in a high cost for creating content. Cognitive Tutors (Koedinger et al., 2004), created by LearnLab (http://www.learnlab.org/), also provide tutoring in addition to being extendable to other domain or content. They have been successful in raising students' math test scores in high school and middle-school classrooms. Authoring tools, named, CTAT, are provided to make content creation easier for experts and possible for novices in cognitive science. However, the cognitive tutors lack the administrative tools necessary for non-experts to effectively manage the system, they are not web-based and do not provide comprehensive reports about students’ progress. The National Center for Research on Evaluation, Standards and Student Testing (CRESST) (Vendlinski et al., 2005) provides an online system (not limited to math) and has a collection of tools to support the creation and distribution of content. However, the CRESST system does not offer tutoring, nor does the CRESST system
Towards Live Informing and Automatic Analyzing of Student Learning
61
provides reports for teachers; instead it allows for open ended questions that are then evaluated by a human teacher. Effective Educational Technologies (EET) developed a series of online assessment and tutoring programs (MasteringPhysics - http://www.masteringphysics.com/, MasteringGeneralChemistry, and MasteringAstronomy) together with the authoring tools for content creation. Most like in the ASSISTment system, with the mastering program, students receive feedback based on common wrong answers and misconceptions. By capturing the step-by-step difficulties of individual students, the Mastering platform responds to each student with individualized hints and instructions. The program provides tools to find problems of the wanted type, topic coverage, and level (functioned as a problem difficulty report) and to monitor class/student performance via a gradebook; tracks students’ work on the sub-problems (similar to the scaffolding questions in assistments) and awards partially credit when evaluating students’ performance. MasteringPhysics has been widely used as homework system while the ASSISTment project just started its first step into the picture. LON-CAPA (http://www.loncapa.org/) is a special assessment system because of its distributed learning content management that allows the sharing of assessment materials across institutions and disciplines. It provides assessment analysis gives an overview of how students are performing in the courses. The report shows all the attempts made by a student on each problem and it can also analyse one problem across all students, which is rather simple, comparing reports in the MAP, MasteringPhysics, or the ASSISTment system. Although many of above systems provide reports for teachers, none of them offer reports for principals and tools for researchers to conduct learning experiments and analyse learning effects. Data Source The ASSISTment system is deployed with a completely Internet savvy solution whereby students can simply open a web browser and login in to work on the problems. Our Java-based runtime system (Nuzzo-Jones, Walonoski, Heffernan, & Livak, 2005) will post each student’s actions (other then their mouse movements) to a message server as an xml message that includes action timestamp, student ID, problem ID, student’s action type (did they attempt or just ask for help), student’s input and response. The messages will be stored in the database server at Worcester Polytechnic Institute (WPI). As mentioned above, about 800 students of 9 teachers have been using the ASSISTment system every other week during the school year of 2004 - 2005. Currently, log records in our database show that about 120,000 MCAS items have been done and more than 1,500,000 actions made by these students. Since students are arranged to use our system regularly, our database will continually receive new data for the students. This allows our reporting system to assess students’ performance incrementally
62
Feng and Heffernan
and to give more reliable assessment as time goes on. These large amounts of student data also offer valuable material for further learning analysis using data mining or statistical techniques. Transfer Model A transfer model (Croteau, Heffernan, & Koedinger, 2004) is a cognitive model that contains a group of knowledge components and maps existing questions (original items and scaffolding questions) to one, or more of the knowledge components. It also indicates the number of times a particular knowledge component has been applied for a given question. It is called a transfer model since we hope to use the model to predict when learning and knowledge transfer will happen. Also as a predictive tool, transfer models are useful in selecting the next problem to work on. In the next section, we will show that transfer models are quite important for quality reporting. Massachusetts Curriculum Frameworks breaks the five strands (will be referred to as the MCAS-5) (Patterns, Relations and Algebra; Geometry; Data Analysis, Statistics and Probability; Measurement; Number Sense and Operations ) into 39 “earning standards for 8th grade math and tags each item with one of the 39 standards”. As we have shown in Figure 1, item 19 from Year 2003 has been tagged with “G.2.8 Congruence and similarity,” the 2nd learning standard in the Geometry strand. We have made several attempts of using the 39 MCAS learning standards (will be referred to as the MCAS-39) to “code up” items, first using the state’s mapping with one standard per question, and then with our own coding which allows each question to be tagged with multiple standards. However, we could not get statistically reliable coefficients on the learning standards. So we hypothesise that a finer grained model would help. Additionally, we need a more detailed level of analysis for reporting to teachers and for predicting students’ responses on questions. WPI-106 is a much finer-grained transfer model we have created in WPI with 106 knowledge components. In the model, knowledge components are arranged in a hierarchy based on prerequisite structure. So far, 78 knowledge components in this transfer model have been used to tag the assessments, together with all the scaffolding questions, in our system. Tagging the scaffolding questions enables us to assess individual knowledge components instead of only overall performance. Mappings between WPI-106 and the Massachusetts Curriculum Frameworks have been constructed by nesting a group of finegrained knowledge components into a single category in a coarse model. Table 1 shows the hierarchal nature of the relationship between WPI-106 and the models in Massachusetts Curriculum Frameworks. Consider the item in Figure 1, which had the first scaffolding question tagged with “congruence”, the second tagged with perimeter, the third tagged with equation-solving. In the MCAS-39, the item was therefore tagged with
Towards Live Informing and Automatic Analyzing of Student Learning
63
Table 1 Knowledge Components Transfer Table WPI-106 Inequality-solving Equation-Solving Equation-concept … Plot Graph X-Y-Graph Slope … Congruence Similar Triangles … … Perimeter Circumference Area … …
“setting-up-and-solving-equations,” “understanding-and-applying-congruence-and-similarity” and “using-measurement-formulas-and-techniques.” The item was tagged with three skills at the level of the MCAS-5. At present, we are able to generate reports based on Massachusetts Curriculum Framework, as well as the WPI-106 transfer model which reveals more detailed information about students’ knowledge learning and knowledge components contained in problems. Our most recent research work (Feng, Heffernan, Mani & Heffernan, 2006; Pardos, Heffernan, Anderson & Heffernan, 2006) shows that WPI-106, as a finer-grained cognitive model, can produce better tracking of student performance than MCAS-5 as measured by ability to predict student performance on MCAS test. Reporting System for Teachers
Student Grade Book Report Right now, we only have anecdotal information from our teachers that they find the reporting helpful. Teachers seem to think highly of the ASSISTment system not only because their students can get instructional assistance in the form of scaffolding questions and hint messages while working on real MCAS items, but also because they can get online, live reports on students’ progress while students are using the system in the classroom.
64
Feng and Heffernan
The “Grade Book,” shown in Figure 2, is the most frequently used report by teachers. Each row in the report represents information for one student, including how many minutes the student has worked on the assistments, how many minutes he has worked on the assistments today, how many problems he has done and his percent correct, our prediction of his MCAS score and his performance level. Our prediction of a student MCAS score at this point is primitive. The column is currently simply a function of percent correct. We might even remove these two columns related to MCAS score prediction until we feel more confident in our prediction, in other words, “rough and ready.” In our past research, we have found a strong correlation between our prediction for the 68 students who have used our system since May 2004 and their real MCAS raw score (r = .7) (Razzaq et al., 2005). And we were continually refining our prediction function based on new data (See Feng, Heffernan, & Koedinger, 2006a, 2006b). In these works, we showed that we were able to predict students’ MCAS score pretty well with Mean Absolute Difference being 5.533 out of a full score of 54 points. Besides presenting information on the item level, it also summarizes the student’s actions in ASSISTment metrics: how many scaffolding questions have been done, the student’s performance on scaffolding questions and how many times the student asked for a hint. The ASSISTment metrics are good measurements of the amount of assistance a student needs to finish a problem. Feng, Heffernan, & Koedinger (2006a, 2006b) found evidence showing that the ASSISTment system, as an online assessment system, can do a better job of predicting student knowledge by being able to take into consideration how much tutoring assistance was needed. In addition, the ASSISTment metric tells more about students’ actions besides their performance. For example, it exposes students’ unusual behaviour like making far more attempts and requesting more hints than other students in the class, which might be evidence that students did not take the assistments seriously or was “gaming the system” (Baker, Corbett, & Koedinger, 2004; Walonoski, & Heffernan, 2006). In Figure 2, we see that these three students have used the system for about 30 minutes. (Many students have used it for about 250 minutes during the school year of 2004 - 2005). “Dick” has finished 38 original items and only asked for four hints. Most of the items he got correct and thus our prediction of his MCAS score was high. We can also see that he has made
Tom Dick Harry
Figure 2. Grade Book on real student data
Towards Live Informing and Automatic Analyzing of Student Learning
65
the greatest number of errors on questions that have been tagged with the standard P.1.8 understanding patterns. The student had done six problems tagged with P.1.8 and made errors on two of those problems. Teachers can also see “Harry” has asked for too many hints (63 compared to 4 and 15). Noticing this, a teacher could go and confront the student with evidence of gaming or give him a pep-talk. By clicking the student’s name shown as a link in our report, teachers can even see each action a student has made, his inputs and the tutor’s response and how much time he has spent on a given problem (which we will not present here for lack of space). The “Grade Book” is so detailed that a student commented: “It’s spooky,” “He’s watching everything we do” when her teacher brought students to his workstation to review their progress. By clicking the link of the most difficult knowledge component, the teacher can see what those questions were and what kind of errors the student made (See Figure 3). Knowing students’ reactions to questions helps teachers to improve their instruction and enable them to correct students’ misunderstandings in a straightforward way. Finding out students’ difficult knowledge components also offers a chance to improving our item selection strategy. Currently, random and linear are the only two problem selection strategies supported by our runtime system. Another option could be added if we can reliably detect difficult knowledge components of each individual student, which requires the runtime system to preferentially pick items tagged with those hard knowledge components for the students so that students would have more opportunity to practise on their weak point.
Reports by Knowledge Component Tagging questions with knowledge components in different transfer models enables us to develop reports to inform teachers about the knowledge status of classes and of individual student. The Class Summary report and the Student-Level Knowledge Component report were developed for this purpose. As shown in Figure 4, teachers can select their favourite transfer model, and specify the number of knowledge components to be shown in the report. Knowledge components are ranked according to their correct rate which is students’ correct rate (demonstrated in Figure 4 as green bars together with percent correct as values) at the items tagged with those
Figure 3. Items tagged with difficult knowledge component
66
Feng and Heffernan
knowledge components. By clicking the name of a knowledge component (shown as a hyperlink in Figure 4), teachers are redirected to another page showing the items tagged with the knowledge components. In the new page, teachers are able to see the question text of each item and continue to preview or analyze the item if they want to know more about the item. By presenting such a report, we hope we can help teachers to decide which knowledge components and items should be focused on to maximize the gain of students’ scores at a class level when instructional time is limited. We would like to evaluate the effectiveness of the report by comparing the learning gain in a limited time of the classes for which the teachers have been exposed to this report to the control groups for which this report is not accessible.
Figure 4. Class summary report for a teacher’s classes
Towards Live Informing and Automatic Analyzing of Student Learning
67
In addition to the class level knowledge component report, we present a student level report (developed by Quyen Do Nguyen at WPI) to teachers which shows the knowledge status of individual students. Similar to the class level report, strong and weak knowledge components are listed, but only for the particular student specified by the teacher. The student level knowledge component report comes after the class level report and quickly becomes a favourite report of our cooperating teachers. Teachers love the fact that they can see in this report the weak points of a particular student in their classes so that they will pay more attention to those knowledge components when giving instructions to the student. Since both original items and scaffolding steps have been tagged in different grain sized transfer models in the ASSISTment system, we claim that we can more accurately detect what are the real obstacle knowledge components for each student. Class Progress Report Since our teachers let their students using the ASSISTment system every two weeks, we thought it would be helpful for teachers to track the change of students’ performance if we can show to teachers students’ progress by looking at their performance at each time they worked on the assistments. Figure 5 shows our preliminary progress report for a teacher’s class. In this report, we can see this class has been using our system since September 21st, 2004 and has used it as a class nine times. The average of students’ predicted MCAS raw score increased from 18 to 33, and kept being 33 for a while. (Note, we are being conservative in calculating these predicted MCAS scores, in that we calculate for each students their predict scores using every items them have even done in our system, instead of using only the items done on day they came to the lab.) Standard deviation of scores is also displayed as a column to help teachers see performance variance in the class.
Figure 5. Preliminary progress report for a class
68
Feng and Heffernan
The progress of students’ predicted MCAS raw score over months is more clearly shown in Figure 6. Those students of the five different classes (all from school A) have been using our system for more than five months starting from Sep., 2004. We can see in this graph that students’ predicted MCAS scores on average increased steadily with passing months (even for class Period 9 which left us for two months). In our recent work, the nicely time-tagged progress data (at student level) has been used to construct longitudinal models and thus track students’ learning over time (See Feng, Heffernan, & Koedinger, 2006a).
Analysis of Items A report is built to show difficulty each problem in our system. (See Figure 7: 5 lines of the 200+ lines that are in the report). By breaking original items into scaffolding questions and tagging scaffolding questions with knowledge components, we are able to analyze individual steps of a problem. Figure 8 is what we call a scaffolding report because it reports statistics
Figure 6. Predicted MCAS Score over months Item 20 N-2003 Morph (3/4 of 1 2/3) Item 20 N-2003 (2/3 of 1 1/2) Morph2
24% 26%
Item 18 G-1998 (Angle in isosceles triangle) Item 35 G-2001 (Angle between clock hands) Item 13 D-1998 (Eiffel Tower model)
27% 27% 29%
Figure 7. Problems order by correct rate
Towards Live Informing and Automatic Analyzing of Student Learning
69
on each of the scaffolding questions that are associated with a particular original item. On the first line of Figure 8, we see this problem is hard since only 12% of the students got it correct on their first attempt. Of the 180 students having done this item so far, 154 students could not get the correct answer to the original question, thus forced by the system to go through scaffolding questions to eventually solve the problem. One may notice that 154 is less than 88% of 180, which should be about 158. And the number of attempts on later scaffolding questions went down more. That’s because students could log out and log back in to redo the original question to avoid going through all scaffolding questions. This problem has been solved. 56% of students asked for a hint, telling you something about students' confidence when confronted with this item. (It is useful to compare such numbers across problems to learn which items students think they need help on but don't, and vice versa). Remember that the state classified the item according to its congruence (G.2.8) shown in bold. The other MA learning standards (M.3.8, P.7.8) are the learning standards we added in our first attempt to code using the MCAS 39 standards. We see that only 23% of students that got the original item incorrect can correctly answer the first scaffolding question lending support to the idea that congruence is tough. But we see a as low percent correct 25% on the 3rd question that asks students to solve for x. The statistics result gives us a good reason to tag “P.7.8-setting-up-and-solving-equations” to the problem. Teachers want to know particular skills or knowledge components that cause trouble to students while solving problems. Unfortunately the MCAS is not designed to be cognitively diagnostic. Given the scaffolding report can provide lower level of cognitive diagnosis, our cooperating teachers have carefully designed scaffolding questions for those tough problems to find out the answer. For example, one teacher designed an assistment for
Figure 8. A scaffolding report generated by ASSISTment reporting system
70
Feng and Heffernan
(What is 3/4 of 1 1/2?”), item 20 of year 2003 8th grade MCAS. The first scaffolding question for the assistment is “what mathematical operation does the word ‘of’ represent in the problem.” This teacher said, “Want to see an item that 97% of my students got wrong? Here it is… and it is because they don’t know ‘of’ means they should multiply.” The report has confirmed the hypothesis. 40% of students could not select multiplication with 11 of them selecting division. The scaffolding report has helped us to develop our tutors in an iterative way. For each question, the report shows top common errors and corresponding error messages. When building the Assistments, we have tried to catch common errors students could make and give them instructive directions based on that specific error, such as correcting students’ misunderstanding of question texts or knowledge concepts. But given that students may have different understandings of concepts, assistments may give no messages for some errors, which means our tutor lost chances to tutor students. Also, students may feel frustrated if they are continually being told “You are wrong” but get nothing instructive or encouraging. As shown in Figure 8, the wrong answer “15” to the third question has been given 13 times, but the assistment gave no instructive messages. Noticing this, the assistment builders can improve their tutor online by adding a proper error message for this error. We also display a table that we call the “Red & Green” distribution matrix as shown in Table 2 in the scaffolding report. Numbers in the cells show how many students got correct (indicted by green number in un-shaded cells) or wrong (indicated by red in shaded cells) on a question. We split the number as the questions’ sequence number grows so that it also represents how those students have done on previous questions. In this example, we see that four students who have answered the original question wrong went through all of the scaffolding questions correctly. Given that, we tend to believe those students have mastered the knowledge components required by each step and but need instruction on how to “compose” those steps. It’s also worth pointing out that there are eight students who answered the original question wrong but answered correctly to the last question, which asks the same quesTable 2 “Red & Green” distribution matrix
Towards Live Informing and Automatic Analyzing of Student Learning
71
tion as the original one. Since the assistment breaks the whole problem into scaffolding steps and gives hints and error messages, we would like to believe those students learned from working on the previous steps of this assistment.
Performance Evaluation Our reporting system was first used in May, 2004. In the early stage, it worked well and most reports at the class level could be generated in less than 10 seconds. And it took 10 to 20 seconds to generate a scaffolding report at system level, depending on the number of students who have worked on the item and the number of scaffolding questions the item has. The performance went down when the number of recorded student actions increased past 1 million. In particular, we have seen the Grade Book report take more than two minutes, which we consider unacceptable as a live report. We then switched to Oracle database which provides mechanisms, such as view and stored procedure, to improve query performance. We also updated the approaches we used to generate the reports. Now we can generate the Grade Book report in about seven seconds on average. The time required to generate the system level scaffolding report for Item 19 (See Figure 8) is about five seconds.
Teachers’ Attitudes Towards the System Nice things have been mentioned about the system by our cooperating teachers. To collect usage feedback from teachers, we created an online survey of teachers’ attitudes about the ASSISTment system and how they used the data from the system during the school year 2005 to 2006. The responses are positive. Teachers in general liked the feature that the assistments lead students step by step when they incorrectly answered a question and, “it was great to have the hints that are tailored to their individual needs.” They also consider using the system as a good MCAS practice and loved the fact that they can receive scores at the end of the class. Among the 11 teachers who responded, eight teachers strongly agreed that they thought their students learned by using the system and three agreed somewhat. And nine of the teachers would consider assigning assistment problems as homework for students with computers at home. We noticed a discrepancy that although eight of the 11 teachers thought the data provided the system was helpful, only three teachers said that they did use the data to change what they did instruction in class while seven others mentioned that they only did this somewhat. We hypothesize that one reason for this difference can be the availability of the reports. Most teachers are not accustomed to frequently logging into the system to access the reports on their own. To some teachers, doing this also adds extra work. Actually when being asked the opinion on receiving automatic email reports, nine teachers thought that would be great since it would be “a much
72
Feng and Heffernan
easier and faster way of obtaining the information” and it would eliminate work for them, thus allowing “more time to focus on certain strategies or concepts in class." Developers at WPI are now working on automatically generating and emailing certain reports, as described below. Another thing we care about is how the teachers use the reports. In the survey, most teachers only mentioned that they reviewed common mistake problems with the whole class, which indicated that many functions provided the reporting system have been ignored. Again, availability of the reports might be one explanation. Another reason, we speculate, can lie in the fact that different reports in the system are not quite well organized and there is no demonstration or function specification on the website to help people get started. One teacher did say that she/he was not able to using the data until she/he was shown (by the second author) step by step on how to retrieve the information and then how to make use of it. We are now seeking better communication approach to help teachers discover real values of the reports. Reporting for Principals and Related Results Most of the reports described in the previous section were for normal teachers. As a supplementary to those reports, we have been working on new reports for principals and administrators which will allow them to see which groups of students need most attention on a wider scope across teachers/classes, based on their gender, special education status, if they get free lunch and if they are underrepresented. Given these reports, users can also compare teachers and see that which schools/teachers produced more learning than others. The reports are made possible by the fact that we have trained up longitudinal data analysis models (Feng, Heffernan, & Koedinger, 2006a). Though the reports themselves are still under development, we describe the supporting longitudinal data analysis approach for the reports and show the results we got based on the data collected during the school year of 2004 – 2005. Singer and Willett (Singer & Willett, 2003) style longitudinal data analysis is an approach for investigating change over time, in this case, the change of students’ performance over the course year. It allows us to learn a slope that represents a student’s learning rate and an intercept that represents the estimate of incoming knowledge for each individual student. This is achieved by fitting a multilevel statistical model (also referred to as mixedeffects model) that simultaneously builds two sub-models, in which level-1 sub-model fits within-person change and describes how individuals change over time and level-2 sub-model tracks between-person change and describes how these changes vary across individuals. We applied the longitudinal analysis approach on the log data of 324 students coming from eight different teachers’ classes of two schools and obtained the slope (i.e., the learning rate) and intercept (i.e., the incoming
Towards Live Informing and Automatic Analyzing of Student Learning
73
knowledge) for each student. For all these students, we record certain characteristics such as their gender, special education status, if they get free lunch and if they belong to traditionally underrepresented groups. The first thing we want to test is whether the students from the two schools differ by their learning rate. Before doing this, we noticed that for schools there was a clear difference in incoming students' scores, which makes sense with regard to the fact that one school draws students from the more affluent side of town. We then ran an ANOVA on the slope introducing school as the effect. The result showed a p-value of smaller than 0.0001 with an effect size of 0.595 (See Figure 9), which suggested that one school has caused more learning in students than the other. Then we switched to compare teachers. We did an ANOVA using teacher as a factor and got a p-value that was statistically significant (p < 0.0001). This result led us to conclude that some teachers have done a better job helping student learning than others. The next thing is to investigate which groups of students have shown more knowledge gain over the same period of time as measured by their learning rate. We are especially interested in questions such as, “Which group is better at learning math: boys or girls?” and “Do students from under/over-represented groups show different rates of learning on math?” To answer these questions, we tried different factors in ANOVA, namely gender, under/over-represented, special education status, free lunch or not. It turned out that for the selected data, none of these factors are statistically significant (p > 0.05). Among all these tests, the difference in slope parameter of free-lunch was near significance (p = 0.08) and suggested that the students who got free lunch showed more learning than those who did not. Given this, we went ahead to test if there are difference in the incoming scores of these two group of students and found out that students who did not get free lunch started with a significant higher score (3.33 points higher, p < 0.0001). This is consistent with a general fact that we found before, that is, groups with higher estimated initial scores showed lower rates of learning. Our preliminary speculation on this fact is that 1) this may be attributed to the ceiling effect: it is hard for top students to Figure 9. Compare learning rates of schools
74
Feng and Heffernan
make fast progress; 2) good students were assigned to algebra class and learning content that won't be tested until 10th grade and won’t appear in the ASSISTment system. Further investigation needs to be done to explain this phenomenon. Currently, we are working on automating all the above analyses and implementing the corresponding reports. Reporting as Learning Experiment Tools for Researchers The ASSISTment system allows randomized controlled experiments to be carried out (Razzaq et al., 2005) fairly easily. There is control for the number of items presented to a student, but the system will be able to support control for time soon. Problems are arranged in curriculums in the system. The curriculum can be conceptually subdivided into two main pieces: the curriculum itself, and sections. The curriculum is composed of one or more sections, with each section containing problems or other sections. This recursive structure allows for a rich hierarchy of different types of sections and problems. The section component is an abstraction for a particular listing of problems. This abstraction has been extended to implement our current section types, and allows for future expansion of the curriculum unit. Currently existing section types include Linear (problems or sub-sections are presented in linear order), Random (problems or sub-sections are presented in a pseudo-random order), and Experiment (a single problem or sub-section is selected pseudo-randomly from a list, the others are ignored). Researchers can select items to put into the experiment curriculums, and then assign them to classes. Figure 10 shows a real experiment (Razzaq et al., 2005) that was designed to compare two different tutoring strategies when dealing with proportional reasoning problems and investigated whether students would learn better if asked to set up proportions. The item is from the 2003 MCAS: “The ratio of boys to girls in Meg’s chorus is 3 to 4. If there are 20 girls in her chorus, how many boys are there?” The author built two different assistments that differed only by one extra scaffolding question. One of the conditions involved coaching the students to solve the problem by first setting up the proportion, while the other one did not use the formal notion of proportion. The author made a second morphed version of each by changing the cover story. Finally, the author selected two items to posttest for “far transfer” (See Figure 10). Students participating in the experiment will be randomly assigned to either condition. After they finished the first two items in the random section, they all will encounter the far transfer items as posttest. Participants’ performance on the posttest as well as on the second item in condition will be utilized to evaluate the effectiveness of different tutoring strategies. The experiment set-up/analysis tools (implemented mainly by Shane Gibbons and Emilia Holban at WPI) were developed to facilitate the running
Towards Live Informing and Automatic Analyzing of Student Learning
75
of experiments. The set-up tool allows researchers to schedule when they want to be notified of the results of their experiments during/after the experiments have been carried out. They can get daily, weekly or monthly reports of the situation of their experiments and the notification can also be set up based on the statistically significance (the effect size, p-value, or the number of subjects who have participated in the experiments). If they like, users can type in their email address and have the reports ready in their mail box when the analysis is done. After the experiments were set up and run, the system automatically does the analysis and presents the reports online (See Figure 11) or sends the results to users’ mail box according the settings. There are two types of analyses the project is interested in full automating. The first is to run the appropriate ANOVA to see if there is a difference in performance on the transfer items by condition, and the second is to look for learning in the condition, and see if there is a disproportionate amount of learning by condition. Figure 12 shows the “SetupRatio” condition to have better learning within the condition as well as better learning on the posttest/transfer items (reported in Razzaq et al., 2005) Different kinds of experiments have been run in the ASSISTment system. In addition to the one as described above that investigates how different coaching strategies affect learning, experiments have been run to answer the question that are scaffolding questions useful compared to just hints on the original questions. The survey results indicated that some students found being
Figure 10. An experiment curriculum
76
Feng and Heffernan
forced to do scaffolding sometimes frustrating. We were not sure if all of the time we invested into these fancy scaffolding questions was worth it. Thus, a simple experiment was conducted to find the answer, and the results showed that students that were given the scaffolds performed better although the results were not always statistically significant (Razzaq, & Heffernan, 2006). CONCLUSIONS
In conclusion, we feel that we have developed some state-of-the-art online reporting tools that will help teachers and researchers be better informed about what their students know. Our implicit evaluation is that we have made it possible for all these reports to work live in the classroom. We feel we have a lot to do yet in further automating the statistical analysis of learning experiments. We have done some learning analysis with this year’s data set environing over 800 students and 30 Learning Opportunity Groups. In particular, we see students are about 5% on their second opportunity and
Figure 11. Online experiment analysis report
Figure 12. Learning results
Towards Live Informing and Automatic Analyzing of Student Learning
77
this was statistically significant (Razzaq et al., 2005). Also since doing learning analysis by hand is both time consuming and fallible, another aim of our reporting system is to automat learning analysis process. We have done some preliminary work towards this direction: let teachers create content, and send them emails automatically when we know that their content is better (or worse) than what we are currently using in the assistment systems. References Baker, R. S., Corbett, A. T., & Koedinger, K. R. (2004). Detecting student misuse of intelligent tutoring systems. Proceedings of 7th International Conference on Intelligent Tutoring Systems. Maceio, Brazil. Collins, A., Brown, J. S., & Newman, S. E. (1989). Cognitive apprenticeship: Teaching the crafts of reading, writing, and mathematics. In L. B. Resnick (Ed.), Knowing, Learning, and Instruction: Essays in Honor of Robert Glaser (pp. 453-494). Hillsdale, NJ: Lawrence Erlbaum Associates. Confrey, J., Valenzuela, A., & Ortiz, A. (2002). Recommendations to the Texas State Board of Education on the setting of the TAKS standards: A call to responsible action. From http://www.syrce.org/State_Board.htm Croteau, E., Heffernan, N. T., & Koedinger, K. R. (2004). Why are algebra word problems difficult? Using tutorial log files and the power law of learning to select the best fitting cognitive model. Proceedings of the 7th International Conference on Intelligent Tutoring System. Maceio, Brazil. Feng, M., Heffernan, N.T, Koedinger, K.R. (2006a). Addressing the testing challenge with a web-based e-assessment system that tutors as it assesses. Proceedings of the Fifteenth International World Wide Web Conference (pp. 307-316). New York, NY: ACM Press. Feng, M., Heffernan, N.T, Koedinger, K.R. (2006b). Predicting state test scores better with intelligent tutoring systems: Developing metrics to measure assistance required. In Ikeda, Ashley & Chan (Eds.), Proceedings of the Eighth International Conference on Intelligent Tutoring Systems (pp. 31-40). Heidelberg, Germany: Springer Berlin. Feng, M., Heffernan, N. T., Mani, M., & Heffernan, C. (2006). Using Mixed-Effects Modeling to Compare Different Grain-Sized Skill Models. In Beck, J., Aimeur, E., & Barnes, T. (Eds). Educational Data Mining: Papers from the AAAI Workshop. Menlo Park, CA: AAAI Press. pp. 5766. Technical Report WS-06-05. ISBN 978-1-57735-287-7. Heffernan N. T., Turner T. E., Lourenco A. L. N., Macasek M. A., Nuzzo-Jones G., & Koedinger K. R. (2006). The ASSISTment builder: Towards an analysis of cost effectiveness of ITS creation. Proceedings of the 19th International FLAIRS Conference. Florida. Koedinger, K. R., Aleven, V., Heffernan. T., McLaren, B., & Hockenberry, M. (2004). Opening the door to non-programmers: Authoring intelligent tutor behavior by demonstration. Proceedings of 7th International Conference on Intelligent Tutoring Systems (pp.162-173). Maceio, Brazil. Mostow J., Beck J. E., Chalasani R., Cuneo A., & Jia P. (2002). Viewing and analyzing multimodal human-computer tutorial dialogue: A database approach. Proceedings of the Fourth IEEE International Conference on Multimodal Interfaces (ICMI 2002). Nuzzo-Jones, G., Walonoski, J. A., Heffernan, N. T., Livak, T. (2005). The eXtensible tutor architecture: A new foundation for ITS. In C.K. Looi, G. McCalla, B. Bredeweg, & J. Breuker. (Eds.), Proceedings of the 12th International Conference on Artificial Intelligence In Education (pp. 902-904). Amsterdam: ISO Press.
78
Feng and Heffernan
Pardos, Z. A., Heffernan, N. T., Anderson, B., & Heffernan C. (2006). Using Fine-Grained Skill Models to Fit Student Performance with Bayesian Networks. Workshop in Educational Data Mining held at the 8th International Conference on Intelligent Tutoring Systems. Taiwan. 2006. Razzaq, L, Feng, M., Nuzzo-Jones, G., Heffernan, N.T., Aniszczyk, C., Choksey, S., Livak, T., Mercado, E., Turner, T., Upalekar, R., Walonoski, J., Macasek, M., Rasmussen, K., Koedinger, K., Junker, B., Knight, A., & Ritter, S. (2005). The Assistment project: Blending assessment and assisting. In C.K. Looi, G. McCalla, B. Bredeweg, & J. Breuker (Eds.), Proceedings of the 12th International Conference on Artificial Intelligence in Education. (pp. 555-562). Amsterdam: ISO Press. Razzaq L., & Heffernan, N. T. (2006). Scaffolding vs. hints in the Assistment system. In Ikeda, Ashley & Chan (Eds.), Proceedings of the Eighth International Conference on Intelligent Tutoring Systems. (pp. 635-644). Heidelberg, Germany: Springer Berlin. Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and occurrence. New York, NY: Oxford University Press. Turner, T. E., Macasek, M. A., Nuzzo-Jones, G., Heffernan, N. T., Koedinger, K. R. (2005). The Assistment builder: A rapid development tool for ITS. In C.K. Looi, G. McCalla, B. Bredeweg, & J. Breuker (Eds.), Proceedings of the 12th International Conference on Artificial Intelligence In Education. (pp. 929-931). Amsterdam: ISO Press. Vendlinski, T., Niemi, D., Wang, J., Monempour, S., & Lee, J. (2005). Improving formative assessment practice with educational information technology. From American Educational Research Association 2005 Annual Meeting. Walonoski J., & Heffernan, N. T. (2006). Detection and analysis of off-task gaming behavior in intelligent tutoring systems. In Ikeda, Ashley & Chan (Eds.), Proceedings of the Eighth International Conference on Intelligent Tutoring Systems. (pp. 382-391). Heidelberg, Germany: Springer Berlin.
Usage Analysis in Learning Systems, 79-98
Beyond Logging of Fingertip Actions: Analysis of Collaborative Learning Using Multiple Sources of Data NIKOLAOS AVOURIS, GEORGIOS FIOTAKIS, GEORGIOS KAHRIMANIS, MELETIS MARGARITIS, AND VASSILIS KOMIS University of Patras, Greece [email protected] In this article, we discuss key requirements for collecting behavioural data concerning technology-supported collaborative learning activities. It is argued that the common practice of analysis of computer generated log files of user interactions with software tools is not enough for building a thorough view of the activity. Instead, more contextual information is needed to be captured in multiple media like video, audio files, and snapshots, in order to re-construct the learning process. A software environment, Collaborative Analysis Tool, (ColAT) that supports interrelation of such resources in order to analyse the collected evidence and produce interpretative views of the activity is described.
Introduction Collection of usage data by registering users’ operations in the form of log files has become mundane during technology-supported learning activities these days. Many researchers assume that learning and cognitive processes can, in principle, be inferred from studying and analysing this recorded behaviour (Hulshof, 2004). Logfile analysis can be used when the purpose is to infer the cognitive processes and social behaviour of persons who interact with software tools. Subsequently, analysis can be performed in a number of ways, for example by examining the frequency with which different operations are carried out or by focusing on the sequence in which operations occur. Analysis of a learning activity is important for under-
80
Avouris, Fiotakis, Kahrimanis, Margaritis, and Komis
standing the complex process involved, improving effectiveness of collaborative learning approaches and can be used as a reflection-support mechanism for the actors involved. Tools to support interaction and collaboration analysis have been proposed in the field of learning technology design and human-computer interaction (Dix, Finlay, Abowd, & Beale, 2004). In the education field, analysis of collaboration and interaction between the actors, such as students, tutors, the artefacts and the environment is a process that can support understanding of learning, evaluate the educational result and support design of effective technology (Gassner, Jansen, Harrer, Herrmann & Hoppe, 2003). Many researchers have studied the problem of combining multiple sources of data during interaction analysis. For example, Heraud, Marty, France and Carron (2005) proposed combination of keystroke log files and web logs. However, the more challenging question, discussed in this article, is to combine structured data, like log files with unstructured ones, like audio and video recordings in the same environment. In this article, we describe first the typical characteristics of a software environment that records users’ operations and then supports their analysis during the activity and off line. In the second part of the article, we argue further that while this approach is useful, more contextual information is needed to be interrelated to the collected log files. So an innovative analysis tool (ColAT) is presented that can be used for effective analysis of interrelated multiple data that may be collected during technology-supported learning activities. Logfile-based Analysis of Learning Activities One of the new opportunities that information and communication technologies offer to learning activities is related to automatic logging of actions by the computer environments used. The outcome of this process, in the form of a log file, may be used for analysing and evaluating learning activities. Evaluation can then lead to improvement of learning practices and the computer tools used. A suitable field for the application of log file analysis is Computer-Supported Collaborative Learning (CSCL). Evaluation of individual computersupported learning activities often involves comparisons of pre and post tests indicating levels of knowledge of students. What is assumed by this practice is that learning activities cause individual cognitive processes that are not accessible per se, but only through their outcomes. On the contrary, during collaborative learning social interaction is added to learning activity, so what one participant communicates with others is accessible to researchers, facilitating analysis of the learning process (Dillenbourg, Baker, Blaye & Malley, 1996). The computer is often used as a tool facilitating peer interaction and communication, thus a record of social activity is added to that of interaction with learning content or problem solving operations. The state of evolving
Beyond Logging of Fingertip Actions
81
knowledge must be continuously displayed in this case by the collaboration participants with each other (Stahl, 2001). So logging and analysing of usercomputer tool interactions is of added value when referring to CSCL. There are many different approaches to log file analysis, especially in the case of collaborative activities. In the next section, some of them are presented through a collaborative problem solving environment that integrates a wide range of log file analysis tools. Logfile-based Analysis with the Use of a CSCL Environment In this section, we describe the functionality of a typical environment for analysis of group learning, called Synergo (www.synergo.gr), associated to a synchronous collaboration-support environment, which permits direct communication and problem solving activity of a group of distant students, manipulating a shared graphical representation (Avouris, Margaritis & Komis, 2004). Synergo keeps track of user operations. It also incorporates tools for analysis of these usage log files. Through them, the researcher can play back the recorded activity offline and annotate the jointly produced problem solution, usually in a graphical form (e.g., a concept map, a flow chart etc.), while various indicators and views of the log files can be produced. In a typical synchronous collaborative learning situation in which Synergo is used, two or more actors, supported by networked equipment, collaborate at a distance by communicating directly though an integrated chat tool and by acting in a shared activity space. A graphic representation of a solution to a given problem appears in this shared activity space. This activity is typically tracked through logging of the main events of the actors in the shared space and of the text dialogue events. The Synergo analysis tools are used for presentation and processing mainly of these log files, produced during collaborative learning activities. These log files (see an example at the top in Figure 1) contain time-stamped events, which concern actions and exchanged text messages of partners engaged in the activity, in sequential order. These events have the following structure: {, , , <event-type>, , }.
Some of these fields take their value automatically by the Synergo software. An example from the log file of Figure 1 is the following: {ID = 623, Time1 = 00:18:11, Time2 = 02:02:28, User = hlias, Action = “Insert Concept Relationship,” Attributes = “qualitative(57), x=320, y=304” }. This is a record of an event produced at 00:18:11, that occurred 02:02:28 since the beginning of the activity (relative time), by user Hlias who inserted in the shared activity space an object at position x=320, y=304. Some more attributes can be associated to the log file records. The <event type> attribute categorizes the recorded event. This categorization can be
82
Avouris, Fiotakis, Kahrimanis, Margaritis, and Komis
Figure 1. Synergo analysis tools: The log file (top of the picture) is processed for producing statistical indicators across various dimensions (type of event, time slot, actor), shown in (a). Also, the group sessions over time are shown in (b), while in (c) and (d) the statistical indicators are plotted vs. time done by interpreting one by one the log file events manually. The Synergo environment facilitates this tedious process, by allowing association of kinds of events, automatically generated by the software, to classes. So for instance, all events of type “Change of textual description of concepts” in a concept-mapping tool are associated to the “Modification” general type of action, as shown in Figure 2. Following this first level of automatic annotation of the log file, statistics and visual views concerning the activity can be automatically generated. For instance, in Figure 1 some of the views automatically generated by the Synergo analysis tools can be seen. This is an extract from a log file that was
Beyond Logging of Fingertip Actions
83
Figure 2. Definition of an Event Typology scheme: The low-level recorded events, generated by the software (left) are grouped to action types (right)
generated by a pair of two students of a distance learning course who interacted for over seven hours (462 minutes of interaction spread in 8 sessions). In Figure 1(a) the recorded events are grouped by user and type of event in the top table and by time interval and type of event in the second. The analyst can observe the value of various indicators, like the number of events of type “insert new object in the activity space” per time interval, shown in Figure 1(c), or an interaction diagram indicating the activity per partner of a specific type of event, like chat messages between two partners in Figure 1(d). Finally, a view related to length of sessions in Figure 1(b). These representations can have some value for a trained analyst or teacher, or they can be used as self-awareness mechanisms for students as they can be presented to them during collaborative activities. Not all recorded events however can be automatically annotated in this way, while important events are not captured at all by the log file, as they do not occur as a result of user-tool interaction (i.e., user fingertips activity). For instance, face to face dialogues have to be captured through other media, and interpreted by the analyst. So, after establishing their meaning and intention of the interlocutor, may be annotated accordingly. There are various ways of interaction, for instance, a suggestion of a student on modification of part of the solution can be done either through verbal interaction or through direct manipulation of the objects concerned in the shared activity space. In addition, more complex indicators may be generated. An example is the graph of evolution of the Collaboration Factor, discussed in (Avouris, Margaritis, & Komis, 2004). This index reflects the degree of contribution of actors in the solution of a problem solving task, taking into account the relative weights of actors, components of the solution and types of actions. The Collaboration Activity Function (Fesakis, Petrou & Dimitracopoulou, 2003), constitutes a similar index that calculates the value for collaboration by taking into
84
Avouris, Fiotakis, Kahrimanis, Margaritis, and Komis
consideration the actions performed by users in collaborative environments through all collaboration channels (e.g., shared workspace and chat). In larger group settings, Sociograms, meaning graphic representations of the social links between students, based on the quality and quantity of interactions between them, may be used for representing group relations (Reffay & Chanier, 2003). In general, it has been observed that many log files, like the Synergo log file presented in this section, bear many similarities, both in synchronous and asynchronous collaboration support environments. So it is possible to define a common format and an ontology for representing the semantics of collaborative learning log file data (e.g., Kahrimanis, Papasalouros, Avouris & Retalis, 2006), thus facilitating exchange and interpretation of log files by various researchers. However, despite the increased popularity of log filebased analysis of learning activities, and the useful data and views generated from them, these views may not be enough for gaining a full understanding of the activity and may lead to false interpretations and conclusions. In the following section, the main concerns and shortcomings of analyses restrained to just log file data are reported. First the limitations that are due to loss of information conveyed through additional communication channels in collocated and distant settings are discussed, followed by the specific requirements of mobile learning situations. Shortcomings of the Log File Analysis Approach Computer-supported collaborative activities in the simplest level are classified according to two dimensions: a spatial and a temporal one. In the spatial axis, collaboration activities are discriminated between collocated and distant ones. In the temporal axis. the distinction refers to synchronous and asynchronous activities. Logfile analysis is not favoured equally in all modes of communication, as discussed in the following.
The Case of Collocated CSCL Activities The most problematic cases of use of log files as the only input to analysis are collocated collaboration activities. In such activities, a computer tool used constitutes just one of many different communication channels. The fact is that such a setting does not inhibit oral communication, even if the tool used provides support for exchanging text messages. Furthermore, secondary channels of face to face communication may convey important meaning to the analysis of the activity. Gestures, intonation, facial expressions and posture of students should not be neglected in such case. Moreover, the structure of oral dialogues, in contrast to typed messages, is not easily defined. Important information that has to be considered refers to turn-taking, overlapping, hesitation of one partner or intervals of silence. When students are expected to collaborate face to face, the inadequacy of log files for analysis is rather obvious. However, there are cases, (e.g. in a
Beyond Logging of Fingertip Actions
85
computer equipped classroom), where students are not supposed to have straight face to face communication, that they actually do so. For example, a CSCL environment like Synergo may be used, which provides a synchronous chatting tool. According to our experience, it is not unlikely that collaborating students occasionally engage themselves in oral dialogues during problem solving activity, even if they have to move from their workstations. Such cases may be tricky for an analyst, because the bulk of communication is conveyed through the CSCL tool, but important information communicated orally may escape their attention.
The Case of Distant CSCL Activities – An Example In distant CSCL activities, researchers and activity designers often seem to have a misleading perception of the nature of CSCL activities. They sometimes develop strict educational scripts, provide certain CSCL tools to the students and restrict the students to conduct an activity according to the given directives. However, in practice, students prove to be surprisingly flexible in terms of usage of computer tools – they adopt alternative media in order to interact with their peers. Usage of email, instant messengers and asynchronous discussion forums are the most common examples. The fact that researchers, in contrast to face to face collaboration, cannot physically observe interactions may lead them to completely ignore such practices. An experience related to such practices is reported in a cross-national CSCL activity between Greek and German universities that provides an example of both synchronous and asynchronous collaboration (Harrer, Kahrimanis, Zeini, Bollen & Avouris, 2006). Students from both universities were assigned a task as homework. They were requested to work in dyads with a distant partner, using provided collaboration support tools. In addition, an asynchronous discussion forum was set up, so that students could exchange messages for knowing each other better and planning their work. Students were asked to deliver a report on the activity containing data from any tools used, in order to demonstrate their collaborative activity. This scenario left a lot of freedom to the students to approach their task, in terms of when and how to work together or how to divide the work. The facilitators of the activity, who were researchers aiming to study this kind of distant collaboration activity, preferred to give such freedom to students instead of setting a more contained lab situation. The latter case might have been preferable for controlled analysis of some collaboration aspects, but would have produced an artificial environment that would not have been connected well to the students’ real-world experiences. The reports gathered at the end of the activity revealed that most pairs used additional tools in order to communicate. Five out of ten groups used an instant messenger application and 50% of the groups exchanged email messages. This was rather surprising having in mind that the students had
86
Avouris, Fiotakis, Kahrimanis, Margaritis, and Komis
plenty of tools at their disposal. The recommended collaboration support environment contained integrated chat tools for synchronous communication and a forum for asynchronous communication was also provided. Many students negotiated parts of the problem through chat messages conveyed through external chatting tools and then used the collaboration support environments to finalise the problem solution. Others worked on their own and sent a proposal of a solution to their partners by email. Ignoring that students used other tools than the suggested ones, or underestimating the importance of information conveyed through them would restrain a researcher from understanding thoroughly the studied activities. However, even if one is aware of that problem, it is impossible to gather all data of student communication. In addition to the technical problems, it is expected that students would not always be willing to report them to their supervisors for privacy reasons. Moreover, even if one manages to gather all logged data (regardless of the tool that produces them), that may still not be enough to gain a thorough view on the activity. Students may consult external resources while collaborating (e.g., books, the web) in order to find information. They may also get themselves involved in individual tasks that help them learn. No information on such individual activities can be gained by any kind of log files. In the study reported here, it was found that in many cases students worked on their own for some time and then they were involved in collaborative sessions. In the beginning of these sessions, they negotiated their individually produced partial solutions of the problem. That is a general problem when analysing collaborative activities and especially asynchronous ones. Not all knowledge gained is a product of collaboration. In most cases, collaborative sessions interplay with individual learning, leading to learning results that cannot be easily attributed to one practice or the other. Requirements of Mobile Learning Activities – An Example In the last years, collaborative learning practice favours the use of handheld devices. Future classrooms are likely to be organized around wireless Internet learning devices that will enable a transition from occasional, supplemental use of learning technology in real-world education to frequent, integral use (Roschelle & Pea, 2002). This constitutes a major shift in CSCL practice according to many perspectives. First, a wide range of different sources of information and knowledge may be available for students participating in the same activity. Control over the software used and the modes of communication between students would be very difficult. Moreover, the way that multiple sources of knowledge interplay would not be easily determinable. Adding to the above, the use of peer-to-peer communication architectures that are more likely to prevail in handheld device interactions, the desire of logging all data and integrating them would be rather unrealistic.
Beyond Logging of Fingertip Actions
87
In addition, when analyzing such cases one has to face the same problems as with classic face-to-face collaborative activities. The above reasons justify the claim that analysis of log files of use of handheld devices is inadequate for a thorough analysis of mobile learning activities. In order to give a simple example for such limitations, we describe the experience of designing a collaborative learning activity for a traditional historical/cultural museum (Cabrera et al., 2005). The activity, based on a “Mystery in the Museum” story, involves collaboration of small groups of students through mobile handheld devices. An application has been built that permits authoring of such activities, while a usability evaluation study was performed that revealed some of the limitations of the design. The plot involved a number of puzzles that relate to the exhibits of the museum and their solution brings rewards to the players. These puzzles, the most typical examples of which involved scrabbled images of certain exhibits and verses found in manuscripts of the museum, necessitate collaboration for their solution, as the necessary pieces were spread in the mobile devices of the members of the group (see Figure 3). A negotiation phase was initiated then that resulted in exchange of items that could lead the group to a solution of the particular puzzle. The rewards had the form of clues that help the players solve the mystery. Since a large number of children (e.g., a school party) may be organized in multiple groups, the intention was to create competition among different groups. The aim of the activity was to mix the real and the virtual world and to make children work together in a collaborative way in this setting. To move from evaluation of the technology used to evaluation of collaborative learning, log file analysis cannot offer much. Table 1 summarizes calculations based on action logs, as reported by (Stoica, Fiotakis, Simarro, Muñoz Frutos & Avouris, 2005). Such measures offer just indications of extreme cases of failure, like the unwillingness to work on the task. However, no significant findings can be deduced by such measures. In a later section, we present an alternative approach to analysis that helps shedding light into cases like this. Methodological Concerns A serious shortcoming of log file analysis concerns the interpretation of the meaning of the unit of analysis and of the values of quantitative indicators. For instance, some chat messages logged by a tool, used in a CSCL activity, may be unimportant although they are annotated according to a coding schema and counted in certain indicators. Moreover, action counts may include routine actions as well as crucial ones that are weighted equally. Such issues reveal that quantitative measures using log file events have little reliability if they aim to test hypotheses based on assumptions of meaning of certain logged actions. Therefore, the recommended methodologies for CSCL activities analysis are mostly of qualitative nature, based on unstructured data, discussed in the following.
88
Avouris, Fiotakis, Kahrimanis, Margaritis, and Komis
Figure 3. The screenshots of the handhelds of two partners during the puzzle activity Table 1 Statistics of Logged Actions for Three Groups, G= Group ID, P= Profile (task)
Beyond Logging of Fingertip Actions
89
Analysis of Computer-Supported Collaborative Learning activities constitutes a research field that bears many methodological similarities with other domains of computer-aided learning. As stated above, what is learned by one participant has to be communicated to others, providing valuable information to researchers. The core object of research is interpretation of collaborative interactions. For this purpose, methods from the fields of ethnomethodology (Garfinkel, 1967), conversation analysis (Edwards, & Potter, 1992), interaction analysis (Jordan, & Henderson, 1995), video analysis (Heath, 1986) and ethnography (Hammersley, 1992) are applied. Most of these methodologies demand that the researchers are immersed in the culture of the students and stress the determinant role that the context plays in the learning activity. For analysis of the activities, in addition to log files, other sources of data should be available to researchers. Video captures is one of the most important ones. Furthermore, observation notes, audio recordings and snapshots may be useful. In order not to lose the benefits that log file data provide for analysis, but to overcome the limitation of this approach as well, in the next section we propose an alternative method of analysis with the aid of an innovative analysis tool. Interrelation of the Log File to Other Behavioural Data in ColAT It should be observed that structured data, like a typical log file, takes usually the form of an ordered list of events occurred at the user interface of a software tool. It contains a record of the activity of one or more learning actors, from the rather restrictive point of view of their fingertip actions. However a lot of contextual information relating to the activity, as well as results of the activity in print or other forms, oral communication among the actors, is not captured through this medium. So in this section we present an analysis environment that permits integration of multiple media collected during learning activities and allows the application of qualitative methodologies discussed in the Methodological Concerns section of this article. The Collaboration Analysis Tool (ColAT) is the environment that is used for building an interpretative model of the activity in the form of a multilevel structure, following an Activity Theory approach (Bertelsen & Bodker, 2003), incorporating pointers and viewers of various media. ColAT permits fusion of multiple data by interrelating them through the concept of the universal activity time. Figure 4 shows an example of creation of a new analysis project and interrelation of multiple sources of data. The analysis process during this phase, involves interpretation and annotation of the collected data, which takes the form of a multilevel description of the activity. The ColAT tool, discussed in more detail in (Avouris, Komis, Margaritis & Fiotakis, 2004), uses the form of a theatre’s scene, in which one can observe the activity by following the plot from various standpoints. The
90
Avouris, Fiotakis, Kahrimanis, Margaritis, and Komis
Operations view permits study of the details of action and interaction, as recorded by a log file, while other media like most typically video and audio recordings, capture dialogues, other behavioural data of actors (posture, gestures, facial expressions, etc.), while media like screen snapshots and PDF files record intermediate or final outcomes of the activity. The automatically generated log of behavioural data can be expanded in two ways: • First by introducing additional events as they are identified in the video and other media, and by associating comments and static files (results, screen snapshots, etc.) to specific time stamped events. • Second, more abstract interpretative views of the activity may be produced: the Actions view permits study of purposeful sequences of actions, while the Activity view interprets the activity at the strategic and motivational level, where most probably decisions on collaboration and interleaving of various activities are more clearly depicted. This three-level model is built gradually. The first level, the Operations level, is directly associated to log files of the main events, produced and annotated, and is related through the time stamps to the media like video. The second level describes Actions at the actor or group level, while the third level is concerned with motives of either individual actors or the group.
Figure 4. The ColAT environment: Project definition in which multiple log files and video/audio sources are synchronized by defining their corresponding time offsets
Beyond Logging of Fingertip Actions
91
In Figure 5, the typical environment of the ColAT tool for creation and navigation of a multi-level annotation and the associated media is shown. The three-level model, discussed in more detail in the following, is shown on the right side of the screen, while the video/audio window is shown on the left-hand side. One other feature shown in Figure 5 is the viewer filter, through which a subset of the activity can be presented, related to specific actors, tools or types of events. So for example, the log file events related to a specific actor may be shown, or actions related to a specific tool, or a specific kind of operations. A more detailed description of the multilevel representation of the activity shown in Figure 5 is provided next. The original sequence of events contained in the log file is shown as level 1 (Operations level) of this multilevel model. The format of events of this level, in XML, is that produced by Synergo, ModellingSpace, CollaborativeMuseumActivity and other tools that adhere to this data interchange format (Kahrimanis et al. 2006). Thus the output of these environments can feed into ColAT, as first level structure. A number of such events can be associated to an entry at the Actions level 2. Such an entry can have the following structure: {, , <entry_type>, , } where is a unique identity of the Action, is the period of time during which the action took place, is a classification of the entry according to a typology, defined by the researcher, followed by the that participated in the activity, a
Figure 5. The ColAT environment: Multi-level view of problem solving activity, (The extract is from the study of Learning Activities in a Museum, discussed in this article and in Cabrera et al, 2005)
92
Avouris, Fiotakis, Kahrimanis, Margaritis, and Komis
textual comment or attributes that are relevant to this type of action entry. Examples of entries of this level are: “Actor X inserts a link,” or “Actor Y contests the statement of Actor Z.” In a similar manner, the entries of the third level (Activity level) are also created. These are associated to entries of the previous Actions level 2. The entries of this level describe the activity at the strategy level as a sequence of interrelated goals of the actors involved or jointly decided. This is an appropriate level for description of plans, from which coordinated and collaborative activity patterns may emerge. In each of these three levels, a different event typology for annotation of the entries may be defined. This may relate to the domain of observed activity or the analysis framework used. For entries of level 1 the Object-oriented Collaboration Analysis Framework (OCAF) event typology (Avouris, Dimitracopoulou & Komis, 2003) has been used, while for the action and activity level different annotations have been proposed. In Figure 6, the tools for definition of annotation scheme for actions and identity of actors and tools in ColAT is shown. The various no-structured media, like video or audio that can be associated to logged events through ColAT can be played from any level of this multi-level model of the activity. As a result, the analyst can decide to view the activity from any level of abstraction he/she wishes, for example to play back the activity by driving a video stream from the operations, actions or the activity level. This way the developed model of the activity is directly related to the observed field events, or their interpretation. Other media, like still snapshots of the activity or of a solution built for a given problem, may also be associated to this multilevel model. Any such image may be associated through a timestamp to a point in time, or a time interval, for which this image is valid. Any time the analyst requests playback of relevant sequence of events, the still images appear in the relative window. This facility may be used to show the environment of various distributed users during collaboration, as well as tools and other artefacts used. Also observer comments related to events can be inserted and shown in the
Figure 6. Definition of (a) tools used, (b) actors, and (c) typology of events relating each type of event to a specific color code, in ColAT
Beyond Logging of Fingertip Actions
93
relevant window, as shown in the bottom left corner of Figure 5. The possibility of viewing a process using various media (video, audio, text, log files, still images), from various levels of abstraction (operation, action, activity), is an innovative approach. It combines in a single environment the hierarchical analysis of a collaborative activity, as proposed by Activity Theory, to the sequential character of behavioural data. Validation Studies The discussed tools have been used in a number of studies that involved effective analysis of collected evidence of technology-supported learning activities in various forms. Three such studies are briefly presented here. In the study reported in (Fidas, Komis, Tzanavaris & Avouris, 2005), data were collected of groups of students (15-16 years old) of a Technical Lyceum, interacting through ModelsCreator3, a collaborative modelling environment. Interaction between distant group members was mediated by a chat tool while interaction between group members that were located in front of the same workstation was mainly direct conversation. Interaction in the first case was captured through the ModelsCreator3 log file that conforms to the ColAT format, while the latter was captured through audio recording. By associating the two data sources, valuable information on comparison of the content of interaction that was done through the network and the dialogues of the group members was performed. The educational process was thus discussed according to various dimensions, like group synthesis, task control, content of communication, roles of the students and the effect of the tools used. In these studies, various features of the analysis tools presented here have been used. First, tools have been used for playback and annotation of the activity. Subsequently, the audio and sequences of still images, along with the log files of the studies were inserted in the ColAT environment through which the goal structures of the activities were constructed and studied. In (Voyiatzaki, Christakoudis, Margaritis & Avouris, 2004) a study is discussed of activities that took place in a computer lab of a Junior High school, using the collaboration environment, Synergo. The activity involved exploration by pairs of pupils of a simple algorithm flow chart and negotiation of its correctness through the chat tool. The log files of Synergo were analysed along with contextual information in the form of video recording of the whole classroom during the activity and with observers’ notes. These two data sources where interrelated and through this process the verbal interventions of the tutor where identified and the effect of these on the students problem solving process was studied. This study identified the patterns of pupils’ reactions to tutoring activity. Finally, in a third case, the collaborative learning activity about a mystery play in a museum using PDAs has been studied (Cabrera et al., 2005; Sto-
94
Avouris, Fiotakis, Kahrimanis, Margaritis, and Komis
ica, et al, 2005). In the study, a log file of the museum server was studied in relation to three streams of video from different angles together with the observers’ notes. It was found that various events related to interaction of the students with the exhibits and verbal interactions of the students between them and with their tutor/guide were captured in the video streams and were interrelated with actions at the user interface level of the various PDAs that were automatically recorded by the software application used. In this particular study it was found that the additional information conveyed through the posture of the users and their spatial location was important for studying and understanding the activity, while the limited size of the portable devices and the technical limitations of monitoring the PDA screens during the activity made the video streams and interrelated logged events at the side of the server most valuable source of information. A summary of the presented and briefly discussed studies is included in Table 2. In the three studies, the common characteristic was that in order to
Figure 7. A view of the lab and a snapshot of a pupil workstation during the activity of the study reported by Voyiatzaki et al. (2005). The pupils in pairs had to explore a simple algorithm flow chart and negotiate its correctness, through the chat tool
Beyond Logging of Fingertip Actions
95
Table 2 Summary of the Presented Case Studies Study
Setting
Data Sources
Fidas et al. 2005
Technical Lyceum, Logfiles Information Technology Observer notes class (15-16 year old), 20 pupils audio
Voyiatzaki et al. 2004
Junior High School, Logfiles Computer Lab (14-15 Video year old), 20 Pupils Observer notes Activity sheets
Stoica et al. 2005, Cabrera et al. 2005
Historical/Cultural Logfiles Museum activity 3 Video streams School party (15 year Observer notes old), 12 pupils
Mode of collaboration
Use of ColAT
ModelsCreator3 Interrelation of computer throughthe network, based activity and and face to face recorded face to face interaction, patterns of collaboration emerged Synergo through the The teacher intervention network, with tutor was recorded in video intervention and the effect on students activity was identified Face to face, Using Students gestures, wireless network- posture and face to enabled PDAs face interaction captured on video and interrelated to logs of PDAs and screenshots
analyse effectively the studied activities and test their hypotheses the analysts used additional evidence in various forms, mostly video and audio. These were added to log files generated by the software tools used (chat messages exchanged, actions on concept mapping tools, etc.) and were interrelated to them. The analysis environment ColAT that was used in these cases facilitated and effectively supported the analysis and evaluation task, as described in more detail in the three study reports (Fidas et al., 2005; Voyiatzaki et al. 2004; Stoica et al., 2005) CONCLUSIONS
In this article, we discussed the limitations of the current practice of analysis of log files and the need for using multiple sources of data during the study of collaborative learning activities. Firstly, a typical case of log file-based analysis was presented using the Synergo Analysis tool as an example. Subsequently, the limitations of such an approach were discussed in particular with relation to the requirements of collocated activities, distant collaboration activities and activities related to the use of handheld devices. Finally, the Collaboration Analysis Tool (ColAT) that permits fusion and interrelation of multiple sources of data of collaborative activities was presented and examples of its validation studies were discussed. The log file analysis approach used as main source of data the log files of events generated by user operations in a Collaborative Learning environ-
96
Avouris, Fiotakis, Kahrimanis, Margaritis, and Komis
ment, like Synergo. In this case playback and statistical indicators visualisation were used in order to re-construct the problem solution and view the partners’ contribution in the activity space. However it was found that often such an approach is not adequate for a complete reconstruction of the learning activity, as essential contextual information, beyond the user fingertips actions was missing. The second approach, involves multiple interrelated sources of data. It also involves building of a multilevel interpretation of the solution, starting from the observable events, leading to the cognitive level. This is done by using a combination of multiple media views of the activity. Through this, a more abstract description of the activity can be produced and analysed at the individual as well as the group level. It should be observed that the two presented approaches are complementary in nature; the first one, used for building a quantitative view of the problem solving at the user interface level, while the second one leading to more interpretative structures, as it takes into account additional contextual information in the form of various other media. The result of the first phase can feed the second one, in which case the annotated log file is just one source of information. The two presented tools are quite independent, since their use depends on the available data. The Synergo Analysis Tool is mostly related to the Synergo synchronous problem-solving environment, while the ColAT tool is more generic and can be used for studying any kind of learning activity, which has been recorded in multiple media and has produced both structured data (e.g., log files) and unstructured ones (e.g., text, video, images). In the extracts of three studies, it was demonstrated that there are many issues, relating to analysis of interaction, that necessitate multiple perspectives. Audio recordings of oral communication, video of the whole class or a group of students and observer notes had to be used for interpreting and understanding the fingertip events recorded in the log files. So, analysis tools, like ColAT that interrelate log files and contextual information in these different forms were proved indispensable for supporting and facilitating analysis of activity in these studies. References Avouris, N. M., Dimitracopoulou, A., & Komis, V. (2003). On analysis of collaborative problem solving: An object-oriented approach, Journal of Human Behavior, 19(2), 147-167. Avouris, N., Margaritis, M., & Komis, V. (2004). Modelling interaction during small-group synchronous problem-solving activities: The Synergo approach, 2nd Int. Workshop on Designing Computational Models of Collaborative Learning Interaction, ITS2004, Maceio, Brasil, September 2004. Retrieved June 29 2006, from http://hci.ece.upatras.gr Avouris, N. , Komis, V. , Margaritis, M. , & Fiotakis, G. (2004). An environment for studying collaborative learning activities. Journal of International Forum of Educational Technology & Society, 7(2), 34-41.
Beyond Logging of Fingertip Actions
97
Bertelsen, O.W., & Bodker, S. (2003). Activity Theory. In J. M Carroll (Ed.), HCI Models, Theories and Frameworks, San Francisco, CA, USA: Morgan Kaufmann. Cabrera, J. S., Frutos, H. M., Stoica, A. G., Avouris, N., Dimitriadis, Y., Fiotakis, G., & Liveri, K. D. (2005). Mystery in the museum: Collaborative learning activities using handheld devices. Proc. 7th Int. Conf. on Human Computer Interaction with Mobile Devices & Services, Salzburg, Austria, September 19 - 22, 2005, vol. 111. ACM Press, New York, NY, 315-318. Dillenbourg, P., Baker, M., Blaye, M., O’ & Malley, C. (1996). The evolution of research on collaborative learning. In E. Spada & P. Reimann (Eds) Learning in Humans and Machine: Towards an interdisciplinary learning science. Oxford: Elsevier. 189-211. Dix, A., Finlay, J., Abowd, G, & Beale, R. (2004). Human-Computer Interaction. 3rd Ed., Prentice Hall. Edwards, D. & Potter, J. (1992) Discursive Psychology. London: Sage. Fesakis, G., Petrou, A., & Dimitracopoulou, A. (2003). Collaboration activity function: An interaction analysis’ tool for Computer-Supported Collaborative Learning activities. In 4th IEEE International Conference on Advanced Learning Technologies (ICALT 2004), August 30 - Sept 1, 2004, Joensuu, Finland, 196-200. Fidas, C., Komis, V., Tzanavaris, S., & Avouris, N. (2005). Heterogeneity of learning material in synchronous computer-supported collaborative modelling, Computers & Education, 44(2), 135-154. Garfinkel, H. (1967). Studies in Ethnomethodology. Englewood Cliffs, NJ: Prentice-Hall. Gassner, K., Jansen, M., Harrer, A, Herrmann, K, Hoppe, H. U. (2003). Analysis methods for collaborative models and Activities. In B. Wasson, S. Ludvigsen, U. Hoppe (Eds.), Designing for Change in Networked Learning Environments, Proc. CSCL 2003, Dordrecht, Netherlands: Kluwer Academic Pub, 411-420. Hammersley, M. (1992). What's wrong with ethnography? Methodological Explorations. London, Routlege. A. Harrer, G. Kahrimanis, S. Zeini, L. Bollen, N. Avouris, Is there a way to e-Bologna? Cross-National Collaborative Activities in University Courses, Proceedings 1st European Conference on Technology Enhanced Learning EC-TEL, Crete, October 1-4, 2006, Lecture Notes in Computer Science, vol. 4227/2006, pp. 140-154, Springer Berlin Heath, C. (1986). Video analysis: Interactional coordination in movement and speech. Body Movement and Speech in Medical Interaction, Cambridge University Press, Cambridge, UK, 1-24. Heraud, J. M., Marty, J. C., France, L., Carron, T. (2005). Helping. the interpretation of web logs: Application to learning. Scenario improvement. Proc. Workshop Usage Analysis in Learning Systems, AIED 2005, Amsterdam, July 2005. Retrieved June 29 2006 from: http://liumdpuls.iut-laval.univ-lemans.fr/aied-ws/PDFFiles/heraud.pdf Hulshof, C. D. (2004). Log file analysis. In L.Kimberly Kempf (Ed.), Encyclopedia of Social Measurement. Elsevier, 577-583. Jordan, B. & Henderson, A. (1995). Interaction analysis: Foundations and practice, Journal of the
Learning Sciences, 4(1), 39-103. Kahrimanis, G., Papasalouros, A., Avouris, N., & Retalis, S. (2006). A model for interoperability in Computer-Supported Collaborative Learning. Proc. ICALT 2006 - The 6th IEEE International Conference on Advanced Learning Technologies, IEEE Publ., July 5-7, 2006 – Kerkrade , Netherlands, 51-55.
98
Avouris, Fiotakis, Kahrimanis, Margaritis, and Komis
Reffay, C., & Chanier, T. (2003). How social network analysis can help to measure cohesion in collaborative distance-learning. Computer-Supported Collaborative Learning. Bergen, Norway: Kluwer Academic Publishers. Retrieved June 29 2006 from http://archive-edutice.ccsd.cnrs.fr Roschelle, J., & Pea, R. (2002). A walk on the WILD side: How wireless handhelds may change computer-supported collaborative learning. International Journal of Cognition and Technology, 1(1), 145–168. Stahl, G. (2001). Rediscovering CSCL. In T. Koschmann, R. Hall, & N. Miyake (Eds.), CSCL2: Carrying Forward the Conversation, Hillsdale, NJ, USA: Lawrence Erlbaum Association, 169181. Stoica, A., Fiotakis, G., Simarro Cabrera, J., Muñoz Frutos H., Avouris, N., & Dimitriadis, Y. (2005). Usability evaluation of handheld devices: A case study for a museum application. Proceedings PCI 2005, Volos, Greece, November 2005, Retrieved June 29 2006 from http://hci.ece.upatras.gr. Voyiatzaki, E., Christakoudis, C., Margaritis, M., & Avouris, N. (2004). Teaching algorithms in secondary education: A collaborative approach, Proceedings ED Media 2004, AACE Publ, Lugano, June 2004, 2781-2789.
Usage Analysis in Learning Systems, 99-113
Monitoring an Online Course with the GISMO Tool: A Case Study RICCARDO MAZZA Institute of Communication Technologies, Switzerland [email protected] LUCA BOTTURI NewMinE Lab, Switzerland [email protected]
This article presents GISMO, a novel, open source, graphic student-tracking tool integrated into Moodle. GISMO represents a further step in information visualization applied to education, and also a novelty in the field of learning management systems applications. The visualizations of the tool, its uses and the benefits it can bring are illustrated through a detailed case study of an online course. GISMO provides visualizations of behavioral, cognitive and social data from the course, allowing constant monitoring of students’ activities, engagement and learning outcomes.
Introduction Learning Management Systems (LMS), often called “Virtual Learning Environments,” are a new class of software applications that have been developed during the last ten years, following the growing adoption of elearning in universities, schools, and companies. LMS provide a convenient web-based environment where instructors can deliver multimedia content materials to the students, prepare assignments and tests, engage in discussions, and manage classes at distance (McCormack & Jones, 1997). Thanks to computer-based communication, the students may access the course, study, and perform the interactive learning activities with fewer time and space restrictions. One of the problems that the students may face when learning online is the lack of support by the instructors and by other peers in the course. Because of the nature of computer-mediated communication, students tend to study
100
Mazza and Botturi
alone at home with few (or even no) interactions with others. In such a situation, the role of the course tutor is critical. He/she has to monitor the students’ activities, provide support to the learners who may need it, and facilitate the learning process. Activities such as answering questions, promoting discussions, monitoring the learners’ progress, and testing the acquired knowledge and skills on a regular basis are very important for successful online tutoring practice (Helic, Maurer, & Scherbakov, 2000; Cotton, 1988; Ragan, 1998). One of the best tutoring approaches is based on understanding the needs of individual learners in order to provide adapted help. Regularly monitoring the students’ activities and being aware of what the students are doing in the course are essential conditions to provide adaptive and effective tutoring. Typical questions that the tutor has to address: “Are students participating in discussions?”, “Have they read the course materials?”, and, “How well do they perform on quizzes?” However, these sorts of monitoring activities are not easy to accomplish with common LMS. Although generic LMS are very effective in facilitating the delivery of distance courses, they provide very little help to instructors to gain understanding of cognitive and social processes in distance classes. For example, questions like “Did a student access the whole course materials? When did s/he do it?” or “Has a student participated in discussions regularly?” are difficult to answer appropriately using conventional LMS student activities tracking systems. In this article, we describe the adoption of a graphical tool to the monitoring of a course given at distance. GISMO is a tool that visually represents tracking data collected by the learning management system which can be used by an instructor to gain an understanding of his/her students and become aware of what is happening in distance classes. Tracking data provided by the LMS is a valuable source of data that can be used by the instructor of the course for the usage analysis of contents, resources, and activities. However, tracking data is complex and usually is organized in some tabular format, which in most cases is difficult to follow and inappropriate for the instructors’ needs (Mazza, 2004). GISMO instead adopts the Information Visualization paradigm (Spence, 2001; Card, Mackinlay, & Shneiderman, 1999), which consists of presenting data in a visual form and relying on the perceptual abilities of human vision for their interpretation. A well-constructed graphical representation of data may allow the instructor to develop a deeper understanding of data and immediately discover individuals who may need particular attention. In this article, we will show how GISMO has proven to be useful in several teaching activities performed at distance. The article is organized as follows: The next section presents a generic description of the tool. We will then illustrate how GISMO was used in a real course to support the tutoring activities. A subsequent section will describe some related works that use visualizations to represent students’ data. Finally, we will draw some conclusions and outline some directions of future work.
Monitoring an Online Course with the GISMO Tool: A Case Study
101
GISMO – a Graphical Interactive Student Monitoring System GISMO is a tool that was implemented as follow-up of a previous research on using Information Visualization approaches to facilitate instructors in Web-based distance learning (Mazza, 2004; Mazza & Dimitrova, 2004). GISMO implements some of the visualizations found useful by teachers, based on our experience with the CourseVis research, within a new context, namely the Edukalibre project (Botturi et al., 2005) funded by the European Union. It is integrated into the Moodle LMS (Moodle, 2002) and is visible only to the instructors and tutors of courses as an additional block. We considered the Moodle learning platform in this work primarily because it is the learning platform used in our university. Its Free and Open Source nature allowed the easy integration of GISMO. However, GISMO can be adapted to support other learning platforms, thanks to a software Application Programming Interface (API) that is committed to retrieve some data that is usually present in a wide range of LMS. GISMO provides graphical representations for the information regarded as useful for instructors that we detected with a survey submitted to instructors involved in distance learning in a previous research (Mazza & Dimitrova, 2003). In particular, graphical representations can be produced on social aspects (discussions), cognitive aspects (results on quizzes and assignments), and behavioral aspects (accesses to the course). In the next section, we will illustrate some graphical representations of GISMO’s abilities on data collected from a real course, and we will describe some insights that can be derived from representations. Case Study The use of GISMO might vary in different settings. This section analyzes the case of a completely online course in a master program.
Course Setting The Master of Science in Communication major in Education and Training (MET – www.met.unisi.ch) at the University of Lugano, Switzerland, is a 2year program focusing on adult learning, human resources, educational technologies and intercultural communication. Some of its courses are shared with a twin program run by the Università Cattolica in Milano, Italy. Each year, a class of about 15 students in Lugano collaborates online – both asynchronously and in videoconference sessions – with another 15 students in Milano. One of the courses in which collaboration is more evident is Instructional Design, a completely online course delivered by the University of Lugano and held by Prof. R. Kenny of Athabasca University in Canada. The setting is highly distributed: about 30 students are located in Lugano (15, plus the course teaching assistant) and Milano (15), and the instructor is in Canada.
102
Mazza and Botturi
The course, which spans over 16 weeks (14 working weeks plus two holiday weeks over Christmas), aims at introducing students to Instructional Design techniques and to have them try some of them on a real project. The course is largely project-based: students are divided into groups of three to four and are assigned to a real instructional design and development project. Each group will carry on the design through a set of phases or steps according to the rhythm proposed by the course. Every second week, the class is engaged in an online discussion forum stimulating reflections on the work done thus far. The evaluation is composed of four parts: three group assignments on the project work (90%) and a personal forum participation score. The course is supported by Moodle (a screenshot is depicted in Figure 1), and the installation includes the GISMO module, which has proved of extreme value for a number of critical issues of online learning. The Benefits of GISMO While the course instructor focuses on content production, moderating online discussions, and grading, the teaching assistant is in charge of monitoring the course, mentoring individuals and groups, and providing system-
Figure 1. A screenshot of the main Moodle page of the online course on Instructional Design
Monitoring an Online Course with the GISMO Tool: A Case Study
103
atic feedback to the course instructor. To these ends, GISMO has proven to be a powerful tool for at least three activities: (a) monitoring class and individual behavior, (b) assessing participation in discussion forums, and (c) redesigning the course according to students’ needs. Monitoring class and individual behavior The first – and probably most straightforward – benefit of GISMO is offering a synthetic view of the class behavior in terms of logins, access to resources, and participation in activities. While this might be trivial in a face-to-face or blended learning setting, it is critical in completely online learning: Is the class reacting well to the instructor’s stimuli? Are they actually engaging with the course content? Figure 2 reports a graph on the accesses to the course. A simple matrix formed by students’ names (on Y-axis) and dates of the course (on X-axis) is used to represent the course accesses. Each blue square represents at least one access to the course made by the student on the selected date. The histogram at the bottom shows the global number of hits to the course made by all students on each date. Figure 3 represents the global number of accesses made by students (in X-axis) to all of the resources of the course. If the user clicks with the right-button mouse on one of the bars of the histogram and select the item “Details,” he/she can see the details of the accesses for a specific student. Figure 4 shows such details, that is to say, on which days the student visited a specific resource (with the graph on top) and how many
Figure 2. A graph reporting the students’ accesses to the course
104
Mazza and Botturi
global accesses he/she made to all the resources for each day of the course (with the bar chart on the bottom).
Figure 3. A graph reporting the number of accesses made by the students to the resources of the course
Figure 4. A detail of the accesses performed by a student to the course’s resources
Monitoring an Online Course with the GISMO Tool: A Case Study
105
The advantages of GISMO in this respect are twofold: on the one hand, it offers a simple visualization of specific parameters (login, access to resources), which are much more legible and readable than the long lists of log records available as standard feature in Moodle and in most LMS; on the other, it allows the instructor to compare different aspects of a class’ behavior and to refine what would be otherwise a generic assessment. In fact, it is different to ascertain that there is a high number of logins per week and to be able to see which students actually log in and what they do after logging in: Do they browse course resources? Do they post to or read forums? A quick comparison to the different visualizations in GISMO allows forming a structured assessment of a class’ behavior. Notice that the largest part of these activities is not impossible with standard tracking systems (cf. Figure 5) – only they take more time and require harder cognitive work in interpreting long lists of figures: they hardly can be scheduled as weekly activities, and they are rather confined to post-course assessment. During the Instructional Design course, the teaching assistant used the GISMO overview tool every third week in order to monitor the class’ progress in the activities. After a while, it was possible to identify two
Figure 5. The standard Moodle log window that displays all users’ activities
106
Mazza and Botturi
“slackers” who did a very low number of logins. This first observation was based on the raw access counting of Moodle, then it was developed into a more detailed analysis. For example, the two of them might have different problems, preventing them from fully participating in the course. A crosscheck with detail visualizations (Figure 6) indicated that one student actually logged in thrice at the beginning of the course, accessed some resources and (presumably) printed them out; that student also took part in discussion forums, which she then abandoned. On the other hand, the second student logged in frequently during the first two weeks and then abandoned the course. Constant control over time indicated that the student probably suffered from work overload for other courses: he/she actually came back to the course after some weeks and took the time to revise all past forum messages, and to explain that he/she had to submit his final bachelor’s thesis. Checking this out with GISMO only took a few minutes, while it would have been more difficult without graphs. Assessing Participation in Discussion Forums The evaluation of forum participation covers an important position in the Instructional Design course: it is the only personal mark that contributes to the final grade and is the result of a continuing and potentially hard work, reading other students’ posts and writing one’s own. The requirement for getting full credit for forum participation is posting at least two substantial messages (i.e., something more than “Great! I also agree with that”) in four out of six discussion forums. Controlling this without a visual interface requires huge work, checking messages one by one. GISMO does not provide a complete support to that, but in the case of the Instructional Design course, (a) it allowed identifying at a glance potential “slackers” who did not reach the minimum required number of messages, and (b) it made it easy to retrieve additional data on those cases. Of course, this does not eliminate the fact that actually reading messages is paramount in order to assess the quality of contributions. Figure 7 reports a chart where
Figure 6. Details of the accesses for two students of the course
Monitoring an Online Course with the GISMO Tool: A Case Study
107
instructors may have an overview of all of the discussions in which students participated. For each student of the course, the chart indicates the number of messages posted (with a red square), number of messages read (with a blue circle), and the number of threads started by the student in the discussions (with the green triangle). The GISMO interface allows the teaching staff to identify at once people who post “because you need to” without actually being involved in the discussion and also reading others’ postings. The information that GISMO visualizations provides on this point provided a good indication of the general trend of the class and supported the more finegrained activity of grading, especially in borderline cases. For example, the course instructor learned by heart the names of those making the most relevant contributions to the discussion or those posting more often. The issues in grading concern people who did not demonstrate a particular behavior – and GISMO provided a sound support in assessing those cases. Moreover (and quite differently from the usual LMS built-in tracking systems), GISMO allows filtering of a single student’s data in order to reconstruct her/his personal behavior, thus allowing the teaching staff to identify critical cases of different types. For example, an online discussion can suffer and be brought off-track by super posters who did not take enough time to read course content or only occasionally participate in discussions, reading few messages and writing long ones. Also, some students might be more reflexive and contribute with few postings while actually reading forums regularly and browsing all available content.
Figure 7. Graphical representation of discussions performed in a course
108
Mazza and Botturi
Redesigning the Course According to Students’ Needs A final critical point in online courses is redesign (Botturi, 2003): What was actually useful to students? What did they actually use? Given that the production of digital materials for online courses costs much, having clear information about this is paramount in order to make correct decisions. GISMO’s visualizations offer a set of synthetic and detailed views about resources (namely, the resource access count, the student/resource matrix, and the detailed view of each resource’s use). Figure 8 represents a histogram of number of accesses made by students to each resource. Figure 9 reports an overview of students’ accesses to resources of the course, student names on the Y-axis and resource names on the X-axis. A mark is depicted if the student accessed this resource, and the color of the mark ranges from light blue to dark blue, according to the number of times he/she accessed this resource. Figure 10 reports a detailed view of the usage of a specific resource. The cross-reading of these data allowed the identification of 1. Poorly-used resources (those probably less-integrated in the course mainstream, or even pointless); 2. Resources only used during specific timeframes, which then could be hidden in order to simplify the course interface; 3. Resources accessed by all, but only a few times; 4. Resources accessed often by all.
Figure 8. The resource accesses count overview, providing for each resource the number of accesses made
Monitoring an Online Course with the GISMO Tool: A Case Study
109
Figure 9. A graph reporting an overview of students’ accesses to resources of the course
Figure 10. Details for the accesses to a specific resource of the course For example, the instructors expected that the page explaining how the assignments work was accessed at the beginning of the course, printed out, and then used as reference (see Figure 10). Actually, this was the most-
110
Mazza and Botturi
accessed resource during the whole course. Also, before the deadline for assignment submission, some students asked for a sample submission to see what they were expected to do. The sample assignments were prepared and put online, and GISMO allowed controlling if they were used only by the few students who asked for them or by everybody (which was the case) and when they were accessed. It was therefore decided to include them in the course area right from the start. Also, technically-similar resources (e.g., two webpages, or two PDF files) were used differently according to their content: some were printed, while some were accessed multiple times. The graph in Figure 11 is dedicated to visually indicate the date of submissions of the assignments. Vertical lines correspond to deadlines of each assignment provided to students (represented here on the Y-axis). Lines and marks have different colors for different assignments to help the reader locate the marks corresponding to each. In this example, it can be clearly seen that almost all submissions were late: the students, in fact, asked for an extension of the submission deadline, which was accepted, because they had other deadlines from other courses at the same time.
Some Issues The experiment done so far has shown that GISMO is a powerful tool for course management and provided effective support in the Instructional Design course, allowing the tutor to get a more detailed picture of what was going on in the course. It also has indicated that the correct interpretation of some data and visualizations require some learning and attention.
Figure 11. A graph reporting the submission time of assignments
Monitoring an Online Course with the GISMO Tool: A Case Study
111
First of all, it is important to distinguish resource types: PDF files can be accessed only once and printed, so that a single hit from a student might be all that you can expect. It is not the same for interactive Flash applications, which can be accessed only online. Secondly, it is important to understand the actual meaning of data: For example, a single read post count in Moodle actually means accessing a whole thread – in Moodle, the standard view of a forum post is embedded into a nested thread. Finally, different students have different needs, and the same behavior does not imply the same learning. For example, students with a less intense behavior cannot be supposed to learn less – and vice versa. GISMO visualizations are useful sources of data that must be interpreted carefully before being used for evaluation. For this reason, the choice of the Instructional Design course was to use GISMO mainly for monitoring the course during its delivery and collecting data to support the students more effectively. GISMO was therefore not used directly for evaluation – which was project-based, and supported by other tools available in Moodle – except for the simple support provided in assessing the participation in discussion forums as discussed above. RELATED WORK
Recently, there is a growing interest in the analysis of learner interaction with LMS. Among the different approaches, the idea of using information visualization is one of the newest. Some forms of visualizing cognitive aspects of the students model have been explored in the field of Intelligent Tutoring Systems (Zapata-Rivera & Greer, 2004; Kay, 1995; Hartley & Mitrovic, 2002). Also, some past works attempt to visualize the communication pattern of a course, identify recurring events and roles, and assess the degree of student engagement in social activities (Reffay & Chanier, 2002; Xiong & Donath, 1999). Simple visualization techniques have been used to build a tool to enable instructors to monitor distance students engaged in learning by doing activities based on a specific domain (Despres, 2003). CourseVis (Mazza & Dimitrova, 2004) was our first attempt to explore Information Visualization techniques to graphically render complex, multidimensional student-tracking data, and it was the precursor to this research. The effectiveness of using visualization techniques to explore student-tracking data already has been demonstrated in the CourseVis research (Mazza, 2004) by an empirical evaluation that involved instructors with experience in distance education. That study showed that the graphical representations can help the instructor to quickly and more accurately identify tendencies in their classes and discover individuals who might need special attention. Among the different works that have been proposed on the usage analysis in learning systems, none of them have found practical exploitation on
112
Mazza and Botturi
widely-diffused learning management systems. GISMO goes a step further and aims to become a module widely-diffused in Moodle’s community. With this spirit, GISMO follows the Open Source movement, and the prototype is released as Free Software under the GNU1 General Public License. CONCLUSION AND FUTURE WORK
Based on our experience with the CourseVis research, we have designed GISMO, a tool that graphically represents student-tracking data collected by learning tools to help instructors become aware of what is happening in distance or blended learning classes. GISMO is designed to be a practical tool, easily integrated into a popular LMS that can be used by the instructor in realistic settings. It proposes some graphical representations that can be useful to gain some insights on the students of the course. It is also a further step on the path traced by quite a large body of previous literature and research. The case study included in the article was aimed at illustrating how GISMO works, how it can be useful, and how it can be integrated in the dayto-day activity of online course management and delivery. GISMO is released as Free Software and can be freely downloaded from the project website (http://gismo.sourceforge.net). Several people around the world have contacted us and given some interesting feedback that allowed us to improve the tool. Our plan for the future is to continue receiving feedback from the users and improve the tool to address better usability and users’ needs. References Botturi, L. (2003). E2ML - Educational environment modeling language. In P. Kommers, & G. Richards (Eds.), Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2003, (pp. 304-311). Chesapeake, VA: AACE. Botturi, L., Dimitrova, V., Tebb, C., Matravers, J., Whitworth, D., Geldermann, J., & Hubert, I. (2005). Development-oriented eLearning tool evaluation: The Edukalibre approach. In P. Kommers, & G. Richards (Eds.), Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2005, (pp. 1104-1109). Chesapeake, VA: AACE. Card, K. S., Mackinlay, J. D., & Shneiderman, B. (1999). Readings in information visualization, using vision to think. San Francisco, Cal.: Morgan Kaufmann. Cotton, K. (1988). Monitoring student learning in the classroom [Online]. School Improvement Research Series (SIRS). Northwest Regional Educational Laboratory. U.S. Department of Education. Available: http://www.nwrel.org/scpd/sirs/2/cu4.html. Despres, C. (2003). Synchronous tutoring in distance learning: A model for the synchronous pedagogical monitoring of distance learning activities. In U. Hoppe, F. Verdejo, & J. Kay (Eds.), Artificial Intelligence in Education, Proceedings of the 11th International Conference on Artificial Intelligence in Education, AIED2003, (pp. 271-278). Amsterdam, NL: IOS Press. Hartley, D., & Mitrovic, A. (2002). Supporting learning by opening the student model. In S. Cerri, G. Gouardères, & F. Paraguacu. (Eds.) Intelligent Tutoring Systems, proceedings of ITS 2002, Biarritz, (pp. 453-462). LNCS 2363. Berlin: Springer.
Monitoring an Online Course with the GISMO Tool: A Case Study
113
Helic, D., Maurer, H., & Scherbakov, N. (2000). Web based training: What do we expect from the system. In S. Young, J. Greer, H. Maurer, & Y. S. Chee, (Eds.), Proceedings of The International Conference on Computers and Education (ICCE 2000), Taiwan, November 2000, (pp. 1689–1694). Charlottesville, USA: AACE. Kay J. (1995). The UM toolkit for cooperative user modelling. User Modeling and User Adapted Interaction Journal, 4(3), (pp. 149-196), Springer Netherlands. McCormack, C., & Jones, D. (1997). Building a web-based education system. New York, USA: Wiley. Mazza, R. (2004). Using information visualisation to facilitate instructors in web-based distance learning. Unpublished Doctoral thesis, University of Lugano. Mazza, R., & Dimitrova, V. (2003). Informing the design of a course data visualisator: An empirical study. In C. Jutz, F. Flückiger, & K. Wäfler (Eds.), 5th International Conference on New Educational Environments (ICNEE 2003), Lucerne 26-28 May. (pp. 215 – 220). Bern: Sauerländer Verlage AG. Mazza, R., & Dimitrova,V. (2004).Visualising student tracking data to support instructors in web-based distance education. In Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters - SESSION: Student tracking and personalization, (pp. 154-161). New York, USA: ACM Press. Moodle (2002). A Modular Object-Oriented Dynamic Learning Environment. [Online] http://www.moodle.org. Reffay, C., & Chanier, T. (2002). Social network analysis used for modeling collaboration in distance learning groups. In S. Cerri, G. Gouardères, & F. Paraguacu. (Eds.) Intelligent Tutoring Systems. Proceedings of ITS 2002, Biarritz, (pp. 31-40). LNCS 2363. Berlin: Springer. Ragan, L. C. (1998). Good teaching is good teaching: An emerging set of guiding principles and practices for the design and development of distance education [Online]. DEOSNEWS, The American Center for the Study of Distance Education, Pennsylvania State University, 8(12). http://www.vpaa.uillinois.edu/reports_retreats/tid/resources/penn-state.html. Spence, R. (2001). Information visualisation. Harlow, UK: Pearson Education Limited. Xiong, R., & Donath, J. (1999). PeopleGarden: Creating data portraits for users. In Proceedings of the 12th Annual ACM Symposium on User Interface Software and Technology, (pp. 37-44). New York, USA: ACM Press. Zapata-Rivera, J. D., & Greer, J. (2004). Interacting with inspectable Bayesian student models. International Journal of Artificial Intelligence in Education, 14(2), Amsterdam, (127-163) NL:IOS Press.
Acknowledgements This work was supported by the Swiss Federal Office of Education and (OFES), grant N. UFES SOC/03/47, and it is part of the research project “EDUKALIBRE, Libre software methods for E-Education” funded by the European Union in the years 2003 - 2005. (project N. 110330-CP-1-2003-1ES-MINERVA-M). Further information on the Edukalibre project can be found in http://www.edukalibre.org. GISMO software can be downloaded from http://gismo.sourceforge.net. Thanks also to Christian Milani for the software implementation. Notes GNU is a recursive acronym for "GNU's Not UNIX"; http://www.gnu.org/
1
Usage Analysis in Learning Systems, 115-131
Matching the Performed Activity on an Educational Platform with a Recommended Pedagogical Scenario: A Multi-Source Approach JEAN-CHARLES MARTY University of Savoie, France [email protected]
JEAN-MATHIAS HERAUD Ecole Supérieure de Commerce, Chambéry, France [email protected] THIBAULT CARRON University of Savoie, France [email protected]
LAURE FRANCE Ecole Supérieure de Commerce, Chambéry, France [email protected] The work reported here is in the educational domain and deals specifically with the observation of instrumented pedagogical activities. Our objective is to observe learner behaviour within a web-based learning environment and to understand it. In some cases, this understanding can lead to improve the learning scenario itself. In this article, we first show how to gather observations from various sources. Next, we propose to compose these observations into a trace. This trace reflects the activity of the learner and will be used as a basis for interpreting his/her behaviour. We have illustrated our approach through an experiment carried out in a dedicated web-based learning environment.
INTRODUCTION
Most of the time, a teacher prepares his/her learning session by organizing the different activities in order to reach a particular educational goal. This organization can be rather simple or complex according to the nature of
116
Marty, Heraud, Carron, and France
the goal. For instance, the teacher can decide to split the classroom into groups, ask the students to search for an exercise at the same time, put different solutions on the blackboard, have a negotiation debate about the proposed solutions, and ask the students to write the chosen solution in their exercise books. The organization of the different sub-activities in an educational session is called a learning scenario. In traditional teaching, namely in an environment with no computers, a teacher tries to be as aware as possible of his/her students’ performance, searches for indicators that allow him/her to know the status of a student’s understanding and which activity in the learning scenario this student is performing. The teacher then adapts his/her scenario, for example, by adding further introductory explanations or by keeping an exercise for another session. Once the training session is finished, the teacher often reconsiders his/her learning scenario and annotates it with remarks in order to remember some particular points for the next time. For instance, s/he can remark that the order of the sub-activities must be changed or that splitting into groups was not a good idea. In that case, the teacher is continuously improving his/her learning scenario, thus following a quality approach. In an educational platform, formalisms exist to allow the teacher to describe learning scenarios (IMS-LD) (IMS, n.d.; Martel, Vignollet, Ferraris, David, & Lejeune, 2006). Once the scenario has been described, it can be enacted in the platform. The different actors (by actors, we mean all the participants involved in the learning scenario: students and teacher) can perform the planned activity. At this point, the teacher would like to have the same possibility of being aware of what is going on in the classroom, in order to react in an appropriate way. Of course, s/he cannot have the same feedback from the students, since s/he lacks human contact. However, in such environments, participants leave traces that can be used to collect clues, providing the teacher with awareness of the on-going activity. These traces reflect in-depth details of the activity and can reveal very accurate hints for the teacher. Unfortunately, they are also very difficult to understand. We are thus faced with the problem of relating traces resulting from the performed activity to a recommended learning scenario. This article, focusing on a concrete experiment, describes different aspects necessary to solve this problem. • A first aspect to consider, central to the observation area, is the form of the traces. In order to understand the problem better, in the next section we show rough traces obtained from a real experiment. • Secondly, we do not want to restrict our understanding to the tasks included in the planned scenario. We want to widen the sphere of the observation, so that other activities performed by a student are effectively traced. Even if these activities are outside of the scope of the
Matching Performed and Recommended Pedagogical Activities
117
planned scenario, they may have helped him/her to complete the exercise or lesson. We thus need to collect traces from different sources. • Finally, we propose to help the user to understand better the traces generated: graphical representation is a good means to make links between the learning scenario and the traces. We also take the different sources into account, in order to refine the understanding of the effective activity. We propose a metric to see how much of the activity performed by students is understood by the teacher, which is graphically represented on a “shadow bar.” Our approach concentrates on the link between the performed activity and the recommended scenario. We can take advantage of the interpretation of the traces (Egyed-Zsigmond, Mille, & Prié, 2003) in order to improve the scenario itself (Marty, Heraud, Carron, & France, 2004). Indeed, when reusing learning scenarios in different contexts, the quality of a learning scenario may be evaluated in the same manner as software processes, for instance with the CMM model (Paulk, Curtis, Chrissis, & Weber, 1993). The idea is to reconsider the scenario where some activities are systematically added or omitted by the users. Within the framework of this article, we present the experimentation and the goals we want to reach. We then highlight the diversity of the sources to be observed. The assistance provided to the user concerning the interpretation of collected information is illustrated through an example. Finally, we draw conclusions from this experimentation with respect to the goals we have fixed. GOALS AND EXPERIMENTATION Goals When a teacher decides to use a pedagogical platform with his/her students, s/he first wants to reach a major objective which represents what the students must have acquired at the end of the session. However, using such a platform can also be useful for the purpose of analysis. In particular, using traces resulting from the performed activity may help the analyst to address the three aspects described in the previous section. From a research point of view, our goal is to verify that using multiple sources containing traces of the activity improves the analysis: it can provide significant information, especially when the student's activity is not completely in accordance with what is expected. We thus want to evaluate, through an experiment, to what extent these different sources are useful and how an interactive tool can facilitate the job of the analyst trying to interpret the activity.
118
Marty, Heraud, Carron, and France
Description of the Experimentation
Context Our experimentation proceeded with 3rd year students at the Graduate Business School of Chambéry (France). It lasted one hour and a half, and took place at the same time in two different classrooms. There were eighteen students, a teacher and a human observer in each of the classrooms. During this session, the students had to carry out a recommended learning scenario consisting of two independent activities. The purpose was practical work on the Adobe Photoshop software. All the necessary resources (course documents, original images) as well as communication tools (webmail, forums) were available in the experimentation platform. Students were not allowed to communicate except via computer. The teacher communicated with students using an Instant Messenger and validated the completed work. The observer took notes about the students’ and the teacher’s behaviours on a dedicated grid. Observation Sources In an educational platform, the different actors (teachers, learners) leave digital traces of their activities (Mazza & Milani, 2005). We propose using these traces in order to analyse users' behaviour. Multiple sources generating traces can be taken into account to make this analysis possible. In our example, we have chosen three kinds of digital observation sources: the learning scenario player, the intranet server and the users’ stations. A fourth source related to external events that cannot be traced by the machine is also considered. Each of these sources requires a particular instrumentation, as shown in the following subsections.
Observation of the Learning Scenario Player By supporting the learning scenario activities, we obtain traces containing abstract elements with clear semantics that represent activities. To obtain such a trace, we have to define what the system has to track in the learning scenario. This could be done with a language such as UTL (Iksal & Choquet, 2005). For example, if we track the starting/ending status of each activity, the trace shown in Figure 1 suggests that reading document 1 helped the learner to succeed in exercise 1. These traces are easily interpretable by a teacher.
Figure 1. Observation of the scenario
Matching Performed and Recommended Pedagogical Activities
119
In the present case, the learning scenario was implemented with the Pscenario module (Pscenario, n.d.). This module, easy to integrate in our technological platform, offers the advantage of being instrumentable. At this level, the learning scenario is seen as a set of black boxes which represent arranged activities. Indeed, during the experimentation, each activity of the scenario is only identified by its beginning and its ending thanks to the Pscenario module. We do not record or observe what happens within the activity in the case of this source. A first approach would thus encourage us to instrument only the webbased learning environment in order to observe as many abstract elements as possible. This approach is unfortunately too restrictive because it reduces the diversity of the observed elements whilst forcing us to retain only those for which semantics can be found a priori. Thus, the teacher can observe that certain learners omitted some of the scenario’s steps (e.g., a course document was not read). However, the teacher will not be able to observe the activities which were not envisaged and not performed directly in the web-based learning environment, but which were useful for the learner to finish the scenario (for example the search for a solution to the exercise in a forum). To avoid this disadvantage, we decided to observe other sources as well. Observation of the Intranet Server The logs of the software used on the intranet server constitute a further source of observation. Apache (Apache, n.d.) is a widely-used web server that provides log files which contain all the actions carried out on the server. Nevertheless, the trace is not easily interpretable by a designer due to the fact that the observed elements are at a low level (i.e., close to the elementary actions of the machine or the software used). A trace made up of hundreds of lines such as the ones presented in Table 1 would not be directly exploitable by a teacher or an analyst. However, research on raw log abstraction has already been conducted in various domains, such as web usage mining, text analysis, and web log mining. This research provides numerous tools, for instance, for log cleaning (Zaïane, 2001), log sessionalization (Cooley, Mobasher, & Srivastava, 1999), or text summarization (Spärk-Jones & Galliers, 1996). These tools can make some parts of the obtained log files interpretable. This will allow us to combine them with other sources.
Observation of Users’ Stations Let us suppose that, during a learning session, a student converses with a friend using an Instant Messenger. This interaction is observed neither on the Learning Scenario Player nor on the intranet server. However, this dialogue can be a major element of explanation of the student's activity. We thus propose to instrument users’ stations in order to observe all the users’ interactions.
120
Marty, Heraud, Carron, and France Table 1 Raw Log Data From an Apache File
We propose to use keyloggers for this purpose. Keyloggers are small applications that are executed on a client computer. The basic functioning principle is to record every keystroke in a text file. In our experiment, we have adapted this type of software to trace the user activities by recording all these keystrokes, the computer processes that are executed, the titles of the dialogue boxes displayed on the screen and their content. Table 2 presents a sequence of logs in an XML based language that enables logs to be structured according to different parameters such as sources, users, timestamps, applications and contents. Table 2 Raw Log Data From a Keylogger <producer>AOPe_pc-src-84 2006-03-15 <millisecond>003NotreKLjava.exe <eventType>keyKeyboard<eventValue>"Coffee Room","de[SPACE]trouver[SPACE]"<producer>AOPe_pc-src-84 2006-03-15 <millisecond>000
Matching Performed and Recommended Pedagogical Activities
121
We can note that the use of keyloggers may not respect anonymity which constitutes one of the aspects of privacy. This assertion has to be moderated according to the legislations of the various countries where such systems are used. Within the framework of our experiments, we have chosen the following four principles inspired by the Information and Telecommunication Access Principles Position Statement (http://www.cla.ca/about/access.htm): • No data should be collected from people not aware of it; • Individuals should have the choice to decide how their information is used and distributed; • Individuals should have the right to view any data files about themselves and the right to contest the completeness and accuracy of those files; • Organisations must ensure that the data is secure from unauthorized access. Other Observations In spite of the quantitative richness of the traces resulting from the learners’ stations, some crucial interactions to model the learners’ behaviour might be lacking. Thus, if the observed lesson proceeds in a classroom where a teacher is present, none of the previously observed sources would mention an oral explanation from the teacher or an oral dialogue between students. In the case of distance learning, the problem is identical: it is impossible to know whether or not the learner is in front of his/her machine. Complementary sources of observations could consist in using video (Adam, Meillon, Dubois, Tcherkassof, 2002; Avouris, Komis, Fiotakis, Margaritis, & Voyiatzaki, 2005) or human observers during the lesson. In our experiment, we decided to have human observers to validate possible assumptions. Obtaining an Interpretable Trace In this section, we will show how a person will be able to compose an interpretable trace by annotating the raw observations. We call this person the trace composer. This role could be played either by a student who wishes to keep a trace of his/her path in the session for metacognition purposes, or by a teacher who wishes to compare several traces in order to improve his/her scenario. In the case of our experiment, the trace composer is a teacher, an associate professor at Ecole Supérieure de Commerce (ESC), who conducted the learning session. Raw Observations As stated above, we chose four sources of observation dealing with: • the recommended process (i.e., the teaching scenario imagined by the teacher),
122
Marty, Heraud, Carron, and France
• the actions passing through the intranet server, • the local activities (students’ computers) and • the external observation (two human observers present in each room in order to collect information concerning the exchanges carried out between users, by specifying on a grid their contents and their type). Each of these four sources generates different information. The digital traces (obtained through the first three sources) are presented in Figure 2 and correspond to increasing levels of granularity. We can notice that the traces are linked to each other. For instance, the learning scenario activities observed correspond to several observed elements on the server. For example, three elements observed on the server correspond to exercise 1 failed: (a) the retrieval of the statement, (b) the recording of the file containing the proposed exercise solution, (c) the consultation of the negative validation from the teacher. There are observed elements on the server which do not match any activity envisaged in the learning scenario. For example, the learner posted a message in a forum (d). The observations in the learner’s station can be divided into two categories: • On the one hand, local interactions with software on the learner’s station. For example, the learner can use Microsoft Word (w) to compose a text. • On the other hand, interactions with other computers on the network. For example, communications using an Instant Messenger (f) and (g). Among the latter interactions, those with the server can be easily identified and removed because they already appear in the server log. Comparison with the Recommended Learning Scenario In this section, we compare the activity carried out with the recommended activity in order to measure the differences between the achieved scenario and the recommended learning scenario (cf. 3.2.1). This comparison enables us to
Figure 2. Raw observations
Matching Performed and Recommended Pedagogical Activities
123
estimate the comprehensibility of our observation (cf. 3.2.2). In order to help the trace composer, we propose a graphic representation of this estimate (cf. 3.2.3). Comparison Between Achieved Activity and Recommended Activity This comparison is based on the description of the scenario and the traces obtained through the first source (at the same level of granularity). At this step, there is no need to access the other sources. There are many dimensions to measure the variations with the recommended learning scenario: for instance, the quantity of documents used and produced by the learner or the wealth of exchange with others. Within the framework of our experimentation, it was important for the teacher that the learner ends the learning scenario in a given time. We thus chose to compare the duration of the activity carried out with the duration recommended by the designer of the activity.
Intuitive Estimate of the Clarity of the Observation We define the comprehensibility of a zone as the probability that no ignored activity took place for this duration. The comparison system enables us to represent the observation of an activity with a duration close to the one recommended by the teacher with a strong comprehensibility. On the contrary, if we observe an activity with a duration definitely higher than what was recommended, then the probability that another activity (even one not related to the training) was missed is high. We thus consider that our observation was incomplete in this shaded area. We therefore propose observations available to the trace composer from other sources to let him/her complete the trace. Graphic Representation of the Comprehensibility: The Shadow Bar The shadow bar is a graphic representation of the comprehensibility of an observation. The colour of each area is determined by the estimate of comprehensibility: clear if the observed activities are in line with their recommended durations; dark if the observed activities exceed the time recommended for their permformance; and even completely black if no observation explains the elapsed time. Figure 3 presents the shadow bar corresponding to an observation session. In this example, only observations of the activities of the learning scenario appear in the trace. (a) is the time recommended for exercise 1. (b) is the time exceeding the recommended duration. In the black zone (c), no activity linked with the scenario was observed. Prototype We have designed a user interface prototype that implements a multisource visualisation system coupled with a shadow bar. The prototype was done in Java 2D. The visual representation of the sources and the shadow bar are identical to the previous figures except on the following point: there is
124
Marty, Heraud, Carron, and France
Figure 3. The shadow bar
less visual information, to decrease the cognitive overload of the trace composer, but all the previous data are still available as contextual pop-up menu. For instance, the status (failure/success) of the exercise 1 box is not permanently displayed because it is considered as an attribute of exercise 1. All the attributes of a log box can be locally displayed in a popup. For an example of such a display, see the sixth box of the server source in Figure 4. The information in the popup is displayed with a fish-eye which allows one to maintain readable a single line inside a large amount of data. This prototype was implemented to help the trace composer. The interactions with the prototype are described in the next section. Interactive Construction of the Interpretable Trace In this section, we show how the shadow bar helps the trace composer to choose and annotate the observations among the various sources.
Figure 4. The prototype of the trace composition tool
Matching Performed and Recommended Pedagogical Activities
125
Figure 3 shows the initial situation presented to the trace composer. Only the observations of the activities of the learning scenario are presented. The shadow bar is black during times when no activity has been identified and darkens when an activity exceeds the duration recommended by the teacher. When the trace composer selects shaded zones, the observations on the server corresponding to these zones are posted in the lower part of the chart. For instance, Figure 5 shows the state of the visualisation tool when the trace composer has selected the first totally black zone (between exercise 1 and document 1 read). The trace composer can retain observed elements that s/he considers significant within the framework of the observed learning task. For instance, in Figure 6, four elements (highlighted in black) are of a comparable nature: forum consultation. The trace composer selects these four elements and annotates them as “forum”.
Figure 5. Using the shadow bar to show relevant boxes
Figure 6. Selecting and annotating boxes
126
Marty, Heraud, Carron, and France
The result obtained is presented in Figure 7: a new observation was added to the trace that makes the shadow bar clearer in the corresponding zones. If the trace composer considers observations not to be significant for the learning task, s/he can indicate them as explanatory elements of the time elapsed without including them in the trace. For example, in Figure 8, if s/he designates the two remaining boxes as not significant, the shadow bar on the corresponding zone in Figure 9 is
Figure 7. Adding observations from the server
(a)
(c)
(b)
Figure 8. Choosing observation on the learner’s station
Matching Performed and Recommended Pedagogical Activities
127
cleared with no observation being added. If shaded zones persist in the bar, observations on the learner’s station can be integrated in a similar way. Information in pop-up menus (a) and (b) indicates the direction of the communication which depends on who the initiator of this communication is. The idea is to distinguish communication where the student is the initiator of the exchange, which is probably related to the current activity, from communication where the initiator is someone else, which may not be related to the current activity. For example, in Figure 8, the dialogues of zone (a) were initiated by an external person and were not taken into account by the trace composer. On the other hand, the communications with another student of the session in (b) were annotated as “dialog” in (c). Result Figure 9 presents the trace obtained at the end of the explanation process of the shadow bar. This trace contains only observations of activities related to the observed learning task, according to the trace composer. The clearness of the shadow bar at the end guarantees the low probability of having missed the observation of a learning activity, according to the time metric in this case. DISCUSSION Discussion of Preliminary Results The amount of data recovered during our experimentation is significant and requires the development of specific tools to help the analysis, in particular to automate the composition of elements resulting from different sources. The complete analysis of the results of this experimentation is a work in progress. We want to report here on a first analysis of this experiment. The aim is to take the different traces obtained through the trace composer for a student who succeeded and to compare them with those obtained for one who failed.
Figure 9. Interpretable trace
128
Marty, Heraud, Carron, and France
The learning scenario comprised two exercises and only 7 out of the 36 students finished the scenario before the time limit. We thus isolated the observations of these 7 students in order to compare them with the others. The scenarios carried out by these 7 students are rather similar. If we consider a sufficiently large unit of time (10 minutes) to take into account the various speeds of execution of these students, then the trace of the scenario activities presented on the right of Figure 10 summarises their activity. As a primary conclusion, we supposed that the 7 students that had finished were faster than the others. A more detailed investigation, illustrated in Figure 11, showed us that the true common point between these students was an additional activity, not defined in the learning scenario: all these students requested the validation of exercise 1 by the teacher present in their room. Among the students who had not finished, some read the session documents again; others restarted exercises of the preceding session or compared their results with other students before starting exercise 2. Their traces are too different to be represented in a standard trace. Consequently, it is not possible to compare them with the trace presented in Figure 11. We note the emergence of a new task carried out by all learners that successfully completed the two exercises. It is then legitimate to revise the learning scenario and to wonder whether the insertion of this new task in the scenario is necessary or not. Obviously, the system will not make such a decision. It is in general the designer of the scenario who will deliver his/her
Figure 10. Standard traces of students according to their final results
Figure 11. Standard trace of a student who finished the scenario
Matching Performed and Recommended Pedagogical Activities
129
opinion on the usefulness of this modification. We think that this public needed to be reassured about the quality of their work before starting the second exercise. If we had to redo this lesson with this learning scenario, we would add to the scenario an activity of validation by the teacher between the two exercises. Discussion of the Hypothesis of Our Experiment Our methodology is mainly pragmatic and empirical. We choose to develop a tool allowing us to obtain and keep relevant information about the tasks done by students during a practical session on a technology-enhanced learning system. We validate the results through real experiments: we choose to interfere as little as possible with a classical session in order to get real and valid data. Students are warned that their usual tools are equipped with observation functionalities. The pedagogical sessions took place as usual. The experiment gave us results concerning the improvement of the learning scenario. In order to reach this objective, our approach involves observing the users in a usual working session and tracing the overall activity on the educational platform. We want to point out that multiple sources are fundamental to understand the activity in detail, since they provide valuable additional information. Within its particular scope, the experiment is obviously a success. However, we are aware of the fact that the experiment presented several biases. Some additional experiments should be scheduled to generalise our first results. Thus it would be valuable to set up a new experiment in which the trace composer and the teacher are not the same person; in which the teaching domain is not related to computer science; with a higher number of students; with a more complex pedagogical scenario. The external source was not used in this experiment. We believe that in some cases, it could be very useful, especially to explain dark areas (the student went out of the classroom). CONCLUSION
The work presented in this article proposes a multi-source approach to help an analyst to understand an activity better. We focussed on the match between a recommended pedagogical scenario and the activity performed. This work was illustrated throughout the article by an experiment carried out with two groups of students. The discussion section of this article gives an idea of the benefits which can be drawn from such an approach. The prospects at the end of this experiment are numerous. First, we used the factor of comprehensibility based on the time taken to carry out certain activities. This is only one of the numerous examples of possible metrics. We are going to consider other metrics such as the degree of communication. We noticed from a first general analysis that we can detect a panic state
130
Marty, Heraud, Carron, and France
when the exchanges between participants increase without reason. In addition, the external source of observation, although presented, has not yet been explored. In this experiment, it was simply used as validation for a certain number of assumptions. It would be necessary to consider for the trace composer a means to use this source of information as well as the others when s/he wishes to eliminate shaded areas. Other directions based on this work are currently being explored. We decided to investigate a more precise observation of what is going on throughout the session in the classroom as in (Feng & Heffernan, 2005). This observation is oriented by user classes (with respect to their skills) and a visualization dashboard has been proposed in (France, Heraud, Marty, Carron, & Heili, 2006). This work then led us to consider a more general architecture based on specialized agents (Carron, Marty, Heraud, & France, 2006). In a more general way, increasing the wealth of the observation level and finely clarifying the sequencing of a lesson are paramount stages. They will make it possible to improve the quality of the learning scenarios implemented and, in a second phase, to allow us to evaluate their level of maturity (Marty et al., 2004). References Adam, J. M., Meillon, B., Dubois, M. & Tcherkassof, A. (2002). Methodology and tools based on video observation and eye tracking to analyze facial expressions recognition. 4th Int. Conference on Methods and Techniques in Behavioral Research. Apache (n.d.). The Apache Software Foundation. "Apache HTTP Server Project", http://httpd.apache.org/. Avouris, N., Komis, V., Fiotakis, G., Margaritis, M. & Voyiatzaki, E. (2005). Logging of fingertip actions is not enough for analysis of learning activities. AIED’05 workshop on Usage analysis in learning systems, 1-8. Carron, T., Marty, J. C., Heraud, J. M. & France, L. (2006). Helping the teacher to re-organize tasks in a collaborative learning activity: An agent-based approach. The 6th IEEE International Conference on Advanced Learning Technologies, 552-554. Cooley, R., Mobasher, B., & Srivastava J. (1999). Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems (KAIS), 1(1), 5-32. Egyed-Zsigmond, E., Mille, A. & Prié Y. (2003). Club [clubsuit] (Trèfle): A use trace model. 5th International Conference on Case-Based Reasoning, 146-160. Feng, M. & Heffernan, N. T. (2005). Informing teachers live about student learning: Reporting in assistment system. AIED workshop on Usage analysis in learning systems, 25-32. France, L., Heraud, J. M., Marty, J.C., Carron, T. & Heili J. (2006). Monitoring virtual classroom: Visualization techniques to observe student activities in an e-learning system. The 6th IEEE International Conference on Advanced Learning Technologies, 716-720. Iksal, S., & Choquet, C. (2005). Usage analysis driven by models in a pedagogical context. Workshop Usage analysis in learning systems at AIED2005: 12th International Conference on Artificial Intelligence in Education, 49-56.
Matching Performed and Recommended Pedagogical Activities
131
IMS (n.d.). Global learning consortium: Learning design Specification, http://www.imsglob al.org/learningdesign/. Martel, C., Vignollet, L., Ferraris, C., David, J.-P., & Lejeune, A. (2006). Modeling collaborative learning activities on e-learning platforms. The 6th IEEE International Conference on Advanced Learning Technologies, 707-709. Marty, J. C., Heraud, J. M., Carron, T., & France, L. (2004). A quality approach for collaborative learning scenarios. Learning Technology newsletter of IEEE Computer Society, 6(4), 46-48. Mazza, R., & Milani, C. (2005). Exploring usage analysis in learning systems: Gaining insights from visualisations. AIED’05 workshop on Usage analysis in learning systems, 65-72. Paulk, M. C., Curtis, B., Chrissis, M. B., & Weber, C.V. (1993). Capability maturity model Version 1.1. IEEE Software, 18-27. Pscenario (n.d.). TECFA’s pedagogical scenario tool for PostNuke, http://tecfaseed.unige.ch/door/ (download: http://tecfaseed.unige.ch/door/index.php? name=Downloads&req=viewsdownload &sid=12). Spärk-Jones, K., & Galliers, J. R. (1996). Evaluating natural language processing systems: An analysis and review (Lecture notes in artificial intelligence). Springer. Zaïane, O.R. (2001). Web usage mining for a better web-based learning environment. Conference on Advanced Technology for Education, 60-64.
Acknowledgements This work has been performed in the ACTEURS project (2005-2007), founded by the French Ministry of Education, (ACI “Terrains, techniques, théories”) and in the Computer Science Cluster (project EIAH) founded by the Rhône-Alpes Region.
Usage Analysis in Learning Systems, 133-156
From Data Analysis to Design Patterns in a Collaborative Context VINCENT BARRÉ LIUM/IUT de Laval, France [email protected]
HASSINA EL-KECHAÏ LIUM/IUT de Laval, France [email protected] The underlying aim of the work related in this article, was to define Design Patterns for recording and analyzing usage in learning systems. The implied “bottom-up” approach when defining a Design Pattern brought us to examine data collected in our learning system through different lights: (1) the data type, (2) the human roles involved in the production of the data, or interested by their uses, and (3) the nature of the data analysis. This method has allowed us to have a global view on the data, which can be easily generalized and formalized.
Introduction The desynchronization between the design and the uses in distance education penalizes the iterative optimization of the system’s quality by not taking into account uses with a reengineering objective. That’s why in (Corbière, & Choquet, 2004a) we have proposed a meta-architecture model which explicitly integrates a step dealing with the observation and the comportment analysis of distance learning systems and learning process actors in an iterative process, guided by design intentions. We underline, in particular, the need for a formal description of the design point of view of the scenario, called the prescriptive scenario, as well as the assistance in use analy-
134
Barré, Choquet, and El-Kechaï
sis by allowing the comparison of descriptive scenarios (an a posteriori scenario that effectively describes the learning situation’s sequence (Corbière, & Choquet, 2004b; Lejeune, & Pernin, 2004)) with the predictive scenario. This produces information, significant for designers from a pedagogical point of view, when they perform a retro-conception or a reengineering (Chikofsky, & Cross, 1990) of their systems. In the framework of REDiM (French acronym for Reengineering Driven by Models of e-Learning Systems) project, we are particularly interested in supporting the implementation of designers’ two main roles: (i) to establish the predictive scenario of a given learning situation, and (ii) to anticipate descriptive scenario construction by defining situation observation needs allowing the effective evaluation of the learners’ activity. Thus, describing in details and formalizing the data recorded during a learning session, in order to build significant indicators which qualify the activity, is a crucial step for us. Moreover, such an analysis expertise could be capitalized in a pattern, which could be used in the design of an another TEL system. In this article, we will focus on a particular collaborative e-learning system named Symba. More precisely, we will observe the effective use of a pedagogical scenario in the context of a collective activity supported by collaborative tools. Our experiment thus consists of a project management collective activity, and more specifically, of a web-project management activity (specification and implementation of a website). From our pedagogical reengineering viewpoint, considerable interesting information can arise from this experiment. In particular, we are interested in comparing descriptive scenarios with predictive ones. Moreover, in a collaborative context, another interesting advisability is to compare roles emerging from the activity to those anticipated by designers. In our experiment, and in accordance with a normalization context, we have used the pedagogical model arising from IMS consortium Learning Design (Koper, Olivier, & Anderson, 2003) in order to describe learning activities and to explicit pedagogical scenarios. Nevertheless, we only use IMS LD as a mean for designers to express their intentions, and not in an implementation perspective. The underlying aim of the work related in this article, was to define Design Patterns for recording and analyzing usage in learning systems, in the framework of the DPULS project (DPULS, 2005). The implied bottomup approach when defining a Design Pattern brought us to examine our experience’s data through different lights: (1) the data type, (2) the human roles involved in the production of the data, or interested by their uses, and (3) the nature of the data analysis. This method has allowed us to have a global view on the data, which can be easily generalized and formalized. We are thus within the framework depicted in Figure 1. We have set up an experiment (Symba) to produce data that will be described and analyzed so as to produce indicators (feature of a data highlighting its connection with
From Data Analysis to Design Patterns in a Collaborative Context
135
an envisaged event having a pedagogical significance (DPULS, 2005)). Once formalized, the analyses will then lead to Design Patterns. Thus, according to the framework depicted in Figure 1 and after a short presentation of Symba experiment, we will focus on the three main viewpoints leading towards design pattern elaboration. More precisely, we will first have to ask ourselves about the formal definition of roles endorsed by actors of the system and to specify the motivation that each role has towards data analysis. In (DPULS, 2005) we define a role as a set of tasks performed by one or more human or artificial agent according to specific needs and competencies, e.g. designer, learner, teacher… The second viewpoint will allow us to clarify what kind of data will be analyzed, that is, to formalize and classify each data that will be manipulated by actors in many roles in our experiment. The last viewpoint, which will lead to the formalization of design pattern, will focus on data analysis to distinguish who analyzes data, from who uses the analysis results. Then, those three viewpoints will lead to the formalization of two kinds of design patterns. One kind will be used by analysts in order to produce new data (that is, will represent a formalization of their analysis knowledge), whereas the other kind will be used for high level purposes, that is for engineering, reengineering or regulation purposes, and those patterns will lead to the production of indicators. Presentation of Symba Experiment We have used an experimental CSCL support environment called SYMBA (Betbeder, & Tchounikine, 2003). This environment is a web-based system, developed by the LIUM laboratory in the framework of a Ph.D. study, in order to support Collective Activities in a Learning Context. It was designed following a double objective: (i) allowing students to explicitly work on their
Figure 1. Focus on some aspects of our methodology
136
Barré, Choquet, and El-Kechaï
organization and (ii) providing tailorability (Morch, & Mehandjiev, 2000) features which let students decide which tools and resources they want to be accessible in order to achieve tasks they have defined. With this system, students have to develop a dynamic website using previously taught web project management methodology. A predictive scenario of our experiment is depicted in Figure 2, we will now detail this scenario. According to our theoretical framework, students first have to work collectively (and agree) on the project organization (such as what to be done, who does what, when tasks have to be finished, which tools are necessary for a particular task) before beginning the second part, which consists of collectively performing the tasks they have defined, according to their organization. During all their activities, learners are self-managed and have to define (and collectively agree on) roles they will endorse and act in consequence. From this viewpoint, it will be interesting to take into account that roles have many meanings, and particularly a functional meaning (that is, related to an action, linked to people’s status in an organization) and a socio-affective meaning (that is, related to the way people slip their personality into a functional role). In concrete terms, the learners’ activity was organized in five steps (see Figure 2), and for each one instructional designers set up a task model (see Figure 3) which was communicated to learners for the first three steps. The formal description of what is a correct project organization (that is, task model for each step) is formalized using IMS Learning Design (see data S4.1). In order to explicit this task model, we present in Figure 3 what has to be done for the second step. Many Roles, Different Motivations in Data Analysis In our system, one actor can endorse many roles (for example, the teacher can be either instructional designer, assessor tutor or observed uses analyst). We think that it is very important to focus on role definition rather than on actor definition. We will, thus, began by presenting the different roles endorsed by actors of our system (see Figure 4), and, then, we will detail motivation that those roles have in data analysis.
Presentation of Roles This experimental system is used by actors endorsing four distinct categories of roles (one actor can play more than one role in our experimental system). The first category is made of 56 learners in higher education, from the Laval Institute of Technology, University of Maine (France). They were associated in small groups of five students and they worked either at the University or at home using tools offered by Symba. Those proposed tools are centered around the description, organization and perception of the activity, but learners must also use the environment in order to explicit the organization of their work, with a sharable plan and task editors. The activity
From Data Analysis to Design Patterns in a Collaborative Context
137
Figure 2. Predictive scenario of the web-project management proposed to the learners lasts for four weeks (35 working hours per week) and a predictive pedagogical scenario implying a collaborative learning was proposed, even if students are free to adopt or modify it. One can notice that this predictive scenario may involve concepts that have not yet been taught. The second category is made up of three kinds of tutors. We have moderator tutors whose role is to monitor activity within the learning session and to fill in reports for evaluating tutors (i.e., assessor tutors) in charge of
138
Barré, Choquet, and El-Kechaï
Figure 3. One step of students’ predicted organization
Figure 4. Four kinds of roles (with sub-roles) in our experiment
From Data Analysis to Design Patterns in a Collaborative Context
139
evaluating learners’ activity. This measures knowledge they have acquired. Lastly, domain experts are in charge of assisting learners in their tasks by helping them to solve specific problems connected to their expertise domain. A third category is made of instructional designers. They specify the predictive pedagogical scenario and the uses of the learning system to be observed, they also use results of the effective use of the Learning System analysis in order to improve it (reengineering process). The last category is made of two kinds of analysts. Observed uses modelers are building tracks with collected raw data, either from the Learning system or not (that is, with collected observed uses), whereas observed uses analysts are analyzing the observed uses in order to synthesize information.
Different Motivations in Data Analysis In our experiment, people in many roles want to (and are interested in) analyze data. Instructional designers want to verify if the roles they have predicted are adopted by learners and to detect unforeseen new roles. They are also interested in understanding the effective progress of a session in order to discover inconsistencies in it for reengineering purposes. Observed uses modelers are interested in finding new techniques to improve their analysis abilities, whereas observed uses analysts are interested in finding new patterns to improve their analysis abilities. A part of a moderator tutor’s job is to make reports for assessor tutors on learners’ abilities to collaborate and to work in a group. Assessor tutors want to evaluate knowledge acquired by learners in web project management by verifying if the produced organization is coherent with the method taught during web project management courses. Lastly, domain experts are also involved in analyzing data. Whilst they do not currently analyze data since this analysis cannot be done during the learning session (exploratory manual analysis in this first stage), they nevertheless would be interested in analyzing data so as to understand what learners have done previously when they ask them for help.
What Kind of Data is Being Analyzed? In this section, we will distinguish primary data (data that have not been processed) from derived ones (data obtained from other data). In our experimental system, we have three kinds of primary data (see Figure 5): raw data (data recorded by the learning system), additional data (data linked to an activity but not recorded by the system during sessions) and content data (data produced by the system actors). We also have derived data, some of them being indicators (that is, they have to be interpreted, taking into account the learning activity, the profile and roles of the actors, as well as interaction context), others being intermediate data. From a reengineering perspective, we will use some raw data (either recorded by the learning system or not) in order to derive some new data that
140
Barré, Choquet, and El-Kechaï
will be useful for system actors. We will also need some additional data, such as the predictive scenario for the activity, and content data, that is, outcomes produced by actors during their activities. From a pedagogical perspective, a learning assessment includes both cognitive achievement and collaboration capabilities aspects. Whereas the first aspect can be evaluated by learners’ production analysis (Jonassen, Davidson, Collins, Campbell, & Bannan Haag, 1995) and mainly concerns data labeled as S-3.1, S-3.2, S-5.1 and S-5.2 in Figure 5, the second one can be evaluated using comportment observation and perception analysis (Henri, & Lundgren-Cayrol, 2001). In this article, we will focus on this second aspect and more precisely on role emergence and we will now detail the most important data that helps us to formalize emerging roles arising from learners’ activity. We will first detail the raw data, either recorded by the learning system or not. Please note that many of our raw data deal with communication tools’ tracks and that, in the original tracks, messages are written in French. All those messages have been translated into English. Data S-1.2 (data arising from chat) corresponds to the transcription of all communications exchanged between learners via the chat service. A partial transcription of such messages can be found in Figure 6. Data S-1.3 (data arising from newsgroups) corresponds to the transcription of the entire set of messages posted on newsgroup services. An example of such a message can be found in Figure 7.
Figure 5. Dependencies between data
From Data Analysis to Design Patterns in a Collaborative Context
141
Figure 6. Excerpt of messages exchanged on the chat service
Figure 7. Excerpt of a message exchanged on the newsgroup tool Data S-2.1 (data arising from questionnaires) consists of questionnaires, whose main goal is to evaluate the group functioning by measuring parameters such as participation, collaboration and organization. Student answers to questionnaires are measured with a Likert scale (Babbie, 1992), graduated from 1 to 5 (strictly disagree, disagree, neither agree nor disagree, agree, completely agree). Learners can also give some detailed explanations about their answers. An example of such a questionnaire can be found in Figure 8.
Figure 8. Excerpt of a completed questionnaire
142
Barré, Choquet, and El-Kechaï
We will now detail data obtained by combining it with other data (either primary data or already synthesized ones). Data S-3.2 (data related to collaborative communication tools, i.e., role emergence from chat) is derived from the transcription of all communications exchanged with a chat service. Emerging roles are extracted from this transcription using pragmatic markers (Cottier, & Schmidt, 2004) that need to be defined. Notice that a particular learner can assume many roles throughout a given learning session. All these roles will be reported. So, you need to define pragmatic markers associated to each role you want to find within the gathered sequences. In order to do that, you need to detect both the lexical field of pertinent terms linked to a particular role and the dialog topics. Detecting lexical fields mainly consists in finding words that can be termed as activity warning, such as: work, organization, design, to work, to organize, and to design. You must often base yourself on the fact that those terms are the most used throughout dialog in order to consider them as pertinent. To detect a dialog topic, you can use their proximity with lexical fields previously detected. In order to correctly define pragmatic markers, you also need to identify who is the originator, who are the recipients of the gathered dialog and to interpret the exchange meaning (see Figure 9 for an example). Pragmatic markers linked to our experiment have been manually extracted and organized into an ontology. We are currently working on an automated service that will extract emerging roles from communication tools tracks. This data therefore consists of a list of roles arising from observed communications. This list is annotated with information about which student(s) takes which role(s) and consists in a structured file (see Figure 10). One can notice that these roles and assignments can be identical to those arising from other communication tools (e.g., newsgroups, see data S-3.3). Data S-3.3 (data related to collaborative communication tools, i.e., role emergence from newsgroups) is derived from the transcription of all com-
Figure 9. Pragmatic makers example: lexical fields are highlighted, dialog topic are underlined
From Data Analysis to Design Patterns in a Collaborative Context
143
Figure 10. Annotated list of roles arising from chat (or newsgroup) analysis munications exchanged on newsgroups. Emerging roles are extracted from this transcription using pragmatic markers that need to be defined (see Figure 11). This data therefore consists of a list of roles arising from observed communications (see Figure 10). This list is annotated with information about which student(s) takes which role(s) and consists in a structured file (same structure as for data S-3.2). One can notice that these roles and assignments can be identical to those arising from other communication tools (e.g., chat service, see data S-3.2). Data S-3.4 (data related to questionnaire synthesis) is made of answers to questionnaires (data S-2.1) synthesized in percentages and reported within an evaluation grid summarizing this information for each question.
Figure 11. Pragmatic markers identifying a ‘functional leader' role in newsgroups
144
Barré, Choquet, and El-Kechaï
Data S-3.5 (data related to new roles arising from learners’ activity). The study of interactions made with Symba communication tools (data S-3.2 and data S-3.3), as well as detailed answers made to questionnaires (data S-2.1), allow for the evaluation of the collaborative process from a cognitive and socio-affective viewpoint. This manually derived data consists of an ad-hoc free text report (whose writing is guided by some questions) and allows the evaluation of the coherence between the different facets of each role. In order to highlight this data, we can take an example from our experiment. We have observed the following key points: (i) in the transcription of chat messages (data S-1.2), one student has been unanimously appointed project leader and, therefore, one can expect that this student will play his leader role and that he was chosen by his colleagues because they think he is qualified in group management. (ii) Analysis of data S-1.2 with help of pragmatic markers (in order to produce data S-3.2 – roles emerging from chat messages) indicates that most of the interactions are organized around another student. (iii) In detailed answers to questionnaire (data S-2.1), everyone acknowledges that the initially designed project leader was rejected by all other team members, even if he tried to fulfill his (task based) functional role. This rejection was based on a lack of communication skills of this project leader. To synthesize this situation, one can say that instructional designers have defined a predictive role of project leader and are expecting that this project leader act as a leader on their own. Although he was effectively a leader with respect to tasks they have to do, he was not completely accepted as a leader with respect to his communication skills. Consequently, one can suggest that this project leader role can be split in two facets: a functional one and a socio-affective one (see Figure 12).
Figure 12. Functional and socio-affective leadership (Hotte, 1998)
From Data Analysis to Design Patterns in a Collaborative Context
145
Moreover, in our experiment we are in the context of a self-managed group. In such a context, the two facets must be simultaneously present in the student assuming the project leader role (in order to coordinate interactions inside the group and to facilitate communication between group members). Otherwise, if they are not simultaneously present in the same people, this leads to a group dispersal and reduces collaboration quality. Thus, in our experiment, the data S-3.5 allows us to verify the coherence of the different facets of the roles arising from activity. Data S-3.6 (collaboration quality) corresponds to an indicator allowing the evaluation of the collaboration quality between learners. This data consists of an ad-hoc free text report (whose writing is guided by some questions) and is manually made using reports showing role coherence arising from learners’ activity (data S-3.5) combined with the task model produced by designers (data S-4.1). This verifies the parallels between predicted roles and observed ones (at a per learner level) and requires information on whether collaboration takes place between learners or not. This is derived from the questionnaires, both synthesized form and detailed answers (that is, data S-2.1 and data S-3.4). We will lastly describe one additional data which is used to highlight synthesized data. Data S-4.1 (task model specified by instructional designers) corresponds to the task model as anticipated by designers (see Figure 3). That is, an indication of the activity sequence that learners are supposed to produce using the workplace organization from Symba. This task model is expressed using IMS Learning Design (and, technically, it is an XML file conforming to IMS/LD specification, see Figure 13).
Figure 13. Predictive task organization (excerpt)
146
Barré, Choquet, and El-Kechaï
DATA ANALYSIS
Analysts, and sometimes tutors, analyze data in order to synthesize the information they contain. Results of these different analyses are then used by many actors of our e-learning system. Analysts use them in order to produce new analyses, tutors use them to evaluate learners and designers use them to improve their predictive scenario (following a reengineering cycle) and to capitalize knowledge so as to produce new scenarios (engineering purpose). In the previous section, we have described data (either primary or derived ones) necessary to produce ‘collaboration quality’ indicator. We will now detail how those data are analyzed to produce this indicator. We recall that all dependencies between data are depicted in Figure 5.
Who Analyses Data, How and When? Presently, all analyses are made by the observed uses modelers (analyzing raw data) and observed uses analysts (making analysis from analysis reports made by observed uses modelers). In this first step of our experiment, most of our analyses are done manually, at the end of a learning session. We will first detail the analysis made by observed uses modelers. The analysis of data S-1.2 (data arising from chat), that is, tracks produced by learners via their interactions through the chat system, is done using pragmatic markers (Cottier, & Schmidt, 2004) in order to identify emerging roles. Analysis of data S-1.3 (data arising from newsgroups) is very similar: tracks are produced by learners, by their interactions through newsgroups, and are analyzed with pragmatic markers (Cottier, & Schmidt, 2004) at the end of the session. Data S-3.2 (role emergence from chat) and data S-3.3 (role emergence from newsgroups) are then analyzed together. Roles lists arising from both data are merged into one list which is then enriched with annotations (learners in role) that they contain. Analysis of data S-2.1 (data arising from questionnaires) is made by observed uses modelers and consists of synthesizing answers to questionnaires in percentages and to report them with an evaluation grid. We will now evoke analysis made by observed uses analysts. The analysis they have to do mainly consists of synthesizing information from data S2.1, S-3.4, S-3.5 and S-4.1 in order to produce collaboration quality indicator (that is, data S-3.6). Since data S-2.1 (data arising from questionnaires) also contains detailed answers to questionnaires, it can be used in order to make a synthesis concerning collaboration inside the group. The analysis of data S-3.4 (questionnaires synthesis) is carried out by the human analyst to highlight whether collaboration takes place or not (focusing on learners’ abilities to collaborate
From Data Analysis to Design Patterns in a Collaborative Context
147
and to work in a group). For this purpose, questionnaires were built allowing the evaluation of variables such as: participation, collaboration, communication, work atmosphere, and leadership (see Figure 14). They also need to analyze data S-3.5 (data related to new roles arising from learners’ activity) since they need information regarding similarities between effective roles and their socio-affective facets. For example, a learner having a functional role of project manager would ideally have an organizer or a leader socio-affective role and be rather active (whereas if he has a follower socio-affective role, he would be less successful in his task). Finally, they need to analyze data S-4.1 in order to compare roles arising from the activity to those that were predicted by instructional designers. For the moment, this analysis is done at the end of a learning session. Nevertheless, obtained results suggest that it would be judicious to detect functional and socio-affective role mismatching during the session in order to react as soon as it is detected. This would imply adopting a formative evaluation rather than a summative one (especially for questionnaires). This would also imply automating role extraction with pragmatic markers. We are presently working on it, building an ontology of pragmatic markers. We have already extracted a first subset of lexical fields and dialog topics and we are currently working on extending them. Who Uses the Results of the Analysis, How and for Which Kind of Action? The results of the different analyses are used by many actors of our elearning system. Analysts use them in order to produce new analyses, tutors use them to evaluate learners and designers use them to improve their predictive scenario (following a reengineering cycle) and to capitalize knowledge so as to produce new scenarios (engineering purpose). Therefore, motivations in data analysis can be viewed from one of the following viewpoints: engineering, reengineering, or learners regulation. Moreover, all of our
Figure 14. “Leadership” evaluation with questionnaire
148
Barré, Choquet, and El-Kechaï
analyses are not carried out with the same goal. Some of them are only intermediary results used to produce new data, whilst others are indicators having a meaning by themselves. We will first detail how analysts (both observed uses modelers and observed uses analysts) use the results of previous analysis in order to build new data. First, observed uses modelers are in charge of formatting roles identified by analysts using pragmatic markers on chat messages (data S1.2) and newsgroups messages (data S-1.3). They produce, respectively, data S-3.2 and data S-3.3. Observed uses modelers also need to format percentages calculated by analyst arising from analysis of questionnaires (data S-2.1) using a synthesis grid. Results of analysis of data S-3.2 and data S3.3 are manually formatted in order to constitute a basis for data S-3.5. Then the results of the detailed answers analysis in data S-2.1 is used by observed uses analysts to confirm roles arising from data S-3.2 and data S-3.3. For example, in our questionnaire we have asked students about leadership. The students have to explain if this role has been assumed by one particular student and to express their opinion about commitment of their leaders. If most student answers are similar, pointing out the same student, we can consider that this student has a leader role and therefore this confirms that this role emerged from the activity. We will now detail how indicators are (or can be) used from the following viewpoints: engineering, reengineering, and learners regulation. In our experiment, we have identified three indicators: acquired knowledge quality (data S-3.7), system quality (data S-3.9) and collaboration quality (data S3.6). We will now focus on this last one which comes from the data S-2.1, data S-3.4, data S-3.5 and data S-4.1 joint analyses. From an engineering viewpoint, this indicator will be useful for instructional designers to capitalize knowledge and produce new collaborative scenarios. From a reengineering viewpoint, this indicator will also be useful to instructional designers as it allows them to improve their predictive scenario, taking into consideration effective collaboration that has been observed. Lastly, from a regulation viewpoint, this indicator could be used by moderator tutors in order to correct collaboration problems that can emerge during the activity. Nevertheless, this use implies that these indicators must be computed during the learning session, which is not yet the case. It is also to be expected that these indicators could also be used by assessor tutors to attribute (at least partially) a grade to learners, but, in our experiment, assessor tutors have rejected this use as it seemed too subjective for them. More precisely, they pointed out that they were afraid of penalizing students since this indicator corresponds to an evaluation of the collaboration within the whole group (rather than an individual assessment). They nevertheless point out that such an indicator could be a great help from a regulation viewpoint (if it can be computed during the session rather than at the end of a learning session).
From Data Analysis to Design Patterns in a Collaborative Context
149
From Data Analysis to Design Patterns Design Patterns were originally developed in architecture, then in software engineering and now one finds design patterns in communication interactions and e-learning issues: human computer interaction, web design, pedagogical patterns, and patterns for implementing an institutional e-learning centre1. Design Patterns embody the design experience a community has developed and learned. They describe recurrent problems, the rationale for their solution, how to apply the solution and some of the trade-offs in applying the solution. In this article, we have already shown a way to produce a collaboration quality indicator in order to evaluate whether collaboration takes place or not in a group and to detect functional vs. socio-affective role mismatching among learners. Thus, we will now use design patterns in order to formalize both our observation problem and how to use the ‘collaboration quality’ indicator to tackle this problem. We will use the design pattern language (Celorrio, & Verdejo, 2005) finalized by DPULS project (DPULS, 2005). We will nevertheless need to slightly modify this language so that it fits our needs better. Firstly, the DPULS language defines indicators used by described solution. We need a little more information on indicators, and, particularly, to distinguish if they are used as an input to achieve solution, or built during solution achievement, and, thus, are part of the solution. The second point is related to different viewpoints from which indicators can be used. In previous section, we have seen that indicators can be used from the following viewpoints: engineering, reengineering, and learners regulation. Thus, we need to incorporate these viewpoints in our design pattern language. Emergence of Roles by Tutor Work done in order to produce data S-3.2 and data S-3.3, that is extracting emerging roles from the transcription of communications between learners using pragmatic markers (Cottier, & Schmidt, 2004) is not really dependant on the communication tool used. Indeed, work done with data arising from the newsgroups is very similar to work done with data arising from chat. The key point of this analysis is the method, that is, the use of pragmatic markers. This idea can be abstracted (or captured) with the design pattern shown in Figures 15 – 19. This pattern is thus an abstraction of the process used to produce data S3.2 and S-3.3. But one can also use this DP in order to define an automated service that extracts emerging roles from communication tools tracks. In order to define such a service, you must provide two different elements: (i) communication tool tracks, and (ii) pragmatic markers (for your domain and tailored for the specific roles you want to observe) organized in an ontology (obtained using our “C2.1 - Emergence of roles by tutor” Design Pattern).
150
Figure 15. C2.1 DP > General section
Figure 16. C2.1 DP > Problem section
Barré, Choquet, and El-Kechaï
From Data Analysis to Design Patterns in a Collaborative Context
Collective Organization of a Synchronous Activity Another key point of our data analysis can be abstracted in a similar way: data S-3.6 (collaboration quality). First of all, the building of data S-3.6 uses data S-3.2 and data S-3.3 which are concerned by our first design pattern. Moreover, we use in this process two indicators formalized in (Manca, Persico, Pozzi, & Sarti, 2005a, 2005b). These two indicators can be used as a substitute (and as an enhancement) for data S-3.4 (questionnaire synthesis). The first one is active participation and aims at detecting the number of performed actions showing active participation by students in a given area / course. It takes into account three main activities: sending a message, uploading a document, and attending a chat. The second one is passive participation and aims at detecting the number of passive acts performed by students (e.g., a sending a message, downloading a document). This process thus leads to the design pattern shown in Figures 20 – 24:
Figure 20. C2 DP > General section
From Data Analysis to Design Patterns in a Collaborative Context
In a collaborative e-learning system, tracks arising from communication tools allow us to build useful indicators for all system actors. Indeed, some indicators like collaboration quality (data S-3.6) can, at once, be used by tutors to evaluate learners, by analysts to build other indicators and by designers to evaluate the relevance of their pedagogical scenarios. From this last point of view, we have shown in this article that considering emerging roles arising from communication tools tracks can be useful for reengineering purposes. For example, in our experiment, we have clarified a first reengineering cycle, and this first cycle has allowed us to enrich the predictive scenario made by designers by adding socio-affective roles arising from learning session tracks analysis. Role emergence was one key point of our reengineering process, and was in keeping with comparison of predictive
From Data Analysis to Design Patterns in a Collaborative Context
155
scenarios and descriptive ones enriched with emerging roles. Another interesting point is that proposed indicators can be used in a more general framework than that of our experiment. Indeed, role mining from communication tools tracks can help to enlighten the effective use of the collaborative system and to push the collaboration quality indicator forward, whatever the collaborative experiment may be. These general frameworks then lead to the definition of design patterns. Moreover, in order to support the production of such generic indicators, we have defined software tools (Iksal, Barré, Choquet, & Corbière, 2004) that, once fully developed, will allow the analysis of the collected data based both on the predictive scenario and the formal description of elements to be observed. They will produce formal representations of user comportment, based on observation needs, and thus form a useful guide to implement the reengineering process. Finally, although we have not formalized our methodology for defining Design Patterns, we think our approach could be generalized and applied to another experiences. Indeed, the participants of the DPULS project (ten research teams) have more or less employed the same methodology on their own experiments and the result was the definition of a structured set of forty Design Patterns3. We hope this result could be the first step of a process for capitalizing and sharing through the Technology Enhanced Learning community knowledge on usage analysis. References Babbie, E. (1992). The practice of social research. Belmont, California, Wadsworth. Betbeder, M.-L., & Tchounikine, P. (2003). Symba: A framework to support collective activities in an educational context. International Conference on Computers in Education (ICCE 2003). December 2-5, 2003. Hong-Kong (China). 188-196. Celorrio, C., & Verdejo, F. (2005). Deliverable 32.5.01: The design pattern language. DPULS Project, Kaleidoscope NoE, 2005. [On-line]. Available : http://www.noe-kaleidoscope.org/ Chikofsky, E. J., & Cross II, J. H. (1990). Reverse engineering and design recovery: A taxonomy. IEEE Software, 7(1), 13-17. Corbière, A., & Choquet, C. (2004a). Designer integration in training cycles: IEEE LTSA model adaptation. International Conference on Computer Aided Learning in Engineering Education (CALIE’04), February 16-18, 2004. Grenoble (France). 51-62. Corbière, A., & Choquet, C. (2004b). A model driven analysis approach for the re-engineering of e-learning systems. ICICTE'04. July 1-3, 2004. Samos (Greece). 242-247. Cottier, P., & Schmidt, C.T. (2004). Le dialogue en contexte: Pour une approche dialogique des environnements d'apprentissage collectif. Colloque ARCo 2004. December 8-10, 2004. Compiègne (France). DPULS (2005). Design patterns for recording and analyzing usage in learning systems. Workpackage 32, Kaleidoscope Network of Exellence, supported by the European Community under the Information Society and Media Directorate-General, Content Directorate, Learning and Cultural Heritage Unit. Contract 507838. Consulted June, 2006, at http://www.noe-kaleidoscope.org/
156
Barré, Choquet, and El-Kechaï
Henri, F., & Landgren-Cayrol, K. (2001). Apprentissage collaboratif à distance: pour comprendre et concevoir les environnements d’apprentissage virtuels. Sainte-Foy (Québec, Canada): Presses de l'Université du Québec, ISBN 2-7605-1094-8. Hotte, R. (1998). Modlisation d’un système d’aide multiexpert pour l’apprentissage coopératif à distance. Unpublished doctoral dissertation, Université Denis Diderot/Paris 7. Iksal, S., Barré, V., Choquet, C., & Corbière, A. (2004). Comparing prescribed and observed for the re-engineering of e-learning systems. IEEE Sixth International Symposium on Multimedia Software Engineering, December 13-15, 2004. Miami (USA). Koper, R., Olivier, B., & Anderson, T. (2003). IMS Learning Design v1.0 Final Specification [on-line]. Available: http://www.imsglobal.org/learningdesign/index.html. Jonassen, D. H., Davidson, M., Collins, M., Campbell, J., & Bannan Haag, B. (1995). Constructivism and computer-mediated communication in distance education. Journal of Distance Education. 9(2), 7-27. Lejeune, A., & Pernin, J-P. (2004). A taxonomy for scenario-based engineering. Cognition and Exploratory Learning in Digital Age (CELDA 2004), December 2004. Lisboa (Portugal). 249-256. Manca, S., Persico, D., Pozzi, F., & Sarti, L. (2005a). An approach to tracking and analyzing interactions in CSCL environments. Proceedings of the E-learning Conference, 2005. Berlin (Germany). Manca, S., Persico, D., Pozzi, F., & Sarti, L. (2005b). Deliverable 32.4.01: Know-how list. DPULS Project, Kaleidoscope NoE, 2005. [On-line]. Available : http://www.noe-kaleidoscope.org/ Morch, A., & Mehandjiev, N. D. (2000). Tailoring as collaboration: The mediating role of multiple representation and application units. CSCW'2000, December 2-6, 2000. Philadelphia, Pennsylvania (USA). 75-100.
Acknowledgments This work has been done within the framework of the DPULS project (DPULS, 2005), funded by Kaleidoscope Network of Excellence supported by the European Community. Notes e-LEN project, see http://www2.tisip.no/E-LEN/patterns_info.php (last consulted, April 2006). Pattern C1 consists in evaluating if and to extent collaboration is taking place using quantitative and qualitative analysis of interactions, Pattern C2.2 differs from pattern C2.1 by using an automated service in order to analyze communication tool tracks. 3 These Design Patterns are accessible at: http://lucke.univ-lemans.fr:8080/dpuls/login.faces 1 2
Usage Analysis in Learning Systems, 157-181
A Structured Set of Design Patterns for Learners’ Assessment ÉLISABETH DELOZANNE Université Paris-Descartes, France [email protected] FRANÇOISE LE CALVEZ Université Paris 6, France [email protected]
AGATHE MERCERON Technische Fachhochschule Berlin, Germany [email protected] JEAN-MARC LABAT Université Paris 6, France [email protected]
In this article we present a structured set of Design Patterns (DPs) that deals with tracking students while they solve problem in specific domains such as programming or mathematics. Our collection of 17 DPs yields a three step approach: First step: to collect and analyze information on each student for each exercise; Second step: to build a higher level view of one student's activity on a set of exercises; Third step: to build an overview of the whole class activity. To evaluate our DP set we investigate whether our DPs account for experiences from the literature as a first step toward a pattern language for students’ assessment.
Introduction The usage of learning systems is a large research field and there is a lot of scattered work on this issue. In our work we assume that a Design Pattern approach is a way to collect and to share experiences, to have a meta reflection and to capitalize on context specific research results. The first set of Design Patterns was suggested by Alexander (Alexander et al., 1977) in the architecture domain. Alexander’s definition of a Pattern is still a reference:
158
Delozanne, Le Calvez, Merceron, and Labat Each pattern describes a problem which occurs over and over again in our environment and then describes the core of the solution of that problem, in such a way that you can use this solution a million times over, without ever doing the same way twice (1977, p. x).
In Alexander’s perspective a network of related patterns creates a language to be used by every one involved in a design process whether one designs a house for oneself, or works with others to design offices or public spaces. In the last three decades, the Pattern approach has found its way into many disciplines such as architecture, software engineering, human computer interaction, website design, and e-learning (Alexander et al., 1977; Buschmann, Meunier, Rohnert, Sommerlad, & Stal 1996; Schmidt, Stal, Rohnert, & Buschmann, 2000; Avgeriou, Papasalouros, Retalis, & Skordalakis, 2003; van Duyne, Landay, & Hong, 2003; Chung, Hong, Lin, Prabaker, Landay & Liu 2004; van Welie, 2004; Deng, Kemp, & Todd, 2005). In the e-learning research field, collections of pedagogical patterns are now available. The Pedagogical Pattern Project (PPP) (PPP, n.d.) provides three collections of patterns for educational scenarios. They aim to capture experts’ practice, in that case, experienced teachers. These high-level patterns use an Alexandrian format and are narrative expressed in a you-form to address academic teachers’ or industry instructors’ problems. One collection of fourteen Patterns is presented as a step towards a pattern language for computer science course development: teaching from different perspectives, active learning, feedback patterns, patterns for experiential learning and patterns for gaining different perspectives. A second set of forty-eight patterns forms a pattern language to teach seminars effectively. A last set of five patterns about running a course is suggested. PPP patterns do not address explicitly the use of technology. The E-LEN project (Avgeriou et al., 2003; Avgeriou P., Vogiatzis D., Tzanavari A., Retalis S., 2004; Goodyear et al., 2004; E-LEN, n.d.) provides a booklet with guidelines to develop Design Patterns for e-learning. It also provides a repository of forty patterns classified into four different special interest groups (SIG): Learning resources and learning management systems (LMS), lifelong learning, collaborative learning and adaptive learning. They aim to construct a knowledge base for educational designers and they promote a Design Pattern approach to collect and disseminate re-usable design knowledge, methods of sharing design experience and supporting the work of multidisciplinary teams. Their DP users are e-learning designers. Their DPs are high-level patterns. For instance, the DP student tracking suggests the functionalities to be implemented in the system but it is not clear how to implement these functionalities. It is the designer’s responsibility to generate a solution adapted to his/her context and eventually, to create a more spe-
A Structured Set of Design Patterns for Learners’ Assessment
159
cialized Design Pattern if he/she detects invariant in his/her solution. “A pattern suggests rather than prescribes a solution” (E-LEN, n.d.). The “Design Patterns for Recording and Analysing Usage of Learning Systems” (Choquet, 2004) (DPULS ) project is part of the Kaleidoscope European Network of Excellence. Its objective was to come up with a set of Design Patterns (DPs) that allows the tracking of actors’ activity. As in the E-LEN project, the goal is to support good design decision-making but also to share experiences and to build a common language between a European set of research teams. The DPULS DPs focus on data collection and analysis to investigate more deeply the student tracking problem and to complement PPP and E-LEN projects. In this article we present the subset of the DPULS DPs that deals with tracking students' know-how and strategies while they solve problems in specific domains such as programming or mathematics. This subset is called Learners' Assessment DPs (LA DPs for short). We first specify our research goals and methodology and compare them to related works. After introducing three different successful practices that grounded our DP design, we present the structured set of DPs on Learners' Assessment. We end with a discussion comparing our approach to others and draw some perspectives. Research Goals and Methodology In communities that have adopted the Design Pattern approach, there is a large agreement on the definition of a pattern as a solution to a recurrent problem in a context (Goodyear et al., 2004). However many issues are debated such as for example: • Who are DP users? DPs are written to support experts, researchers, members of a community of practice, multidisciplinary teams, end-users of the designed product or even a machine to implement the solution. • Why create a DP set? The purposes can vary, for example, to create a common language in a multidisciplinary or multicultural design or research team, to capture/disseminate expertise, to capitalize on previous experiences, to ensure quality, or to teach. • What is the format to express DPs? Many efforts are devoted to define DP form (narrative or UML or XML schemes), structure, naming and referencing. • How to create a DP? The bottom-up approach is the most common approach but some authors suggest a top-down approach or a “bottomup approach inform by theory” (E-LEN, n.d.). • How to create a DP language? Structure, granularity, DP combination, and specificity are key points to be discussed. • How to validate or evaluate a DP language? Several ways of validation are presented in the literature: the “rule of three” (three examples of
160
Delozanne, Le Calvez, Merceron, and Labat
successful experiences using the pattern), peer review, feedback from practitioners and usability testing. Design Patterns are usually drafted, shared, criticized and refined through an extended process of collaboration. To this end, a five-step methodology was adopted in DPULS. First, we studied a number of practices in our context of interest (DPULS, 2005, 2-3). Second, from these practices, we selected a set of common and recurrent problems and we set up a know-how list (DPULS, 2005, 4). Third, we worked on descriptions of problem statements and solutions in a way general enough to cover each individual experience, and it was a major task. From this step we negotiated a common format to express the set of DPs (DPULS, 2005, 5) and to design a browser to support the navigation in this set of DPs (DPULS, 2005, 7). Fourth, at the same time we worked to reach an agreement on whether and how the different problems were linked. Many iterations were necessary between the third and the fourth step. Fifth, we came up with a set of DPs stable enough to account for other similar practices (DPULS, 2005, 6) and we entered the patterns in the browser. Every step involved interactions between partners. The purpose of the DP set presented here was triple: • To express invariance in solutions experimented by DPULS teams to solve assessment problems, • To account for others’ experiences on the same class of problems using our Pattern language, • To support designers who deal with a learner’s assessment in individual learning systems. Here is a scenario that illustrates the sort of problem practitioners might face when designing a system for understanding a student's actual learning activity. Sarah is a teacher who organizes lab work for a group of students. She wants a report on the lab work session to check whether some knowhow has been mastered and to get an insight on the kind of strategies the students have used. Her aim is to plan the next session and to adapt her teaching to her students' understanding. She needs an overview on the students' activity during the lab session.
How should the system be designed to make this scenario possible? Our DPs yield a three step approach: • First step: to collect and analyze information on each student for each exercise; • Second step: to build a higher level view of one student's activity on a set of exercises;
A Structured Set of Design Patterns for Learners’ Assessment
161
• Third step: to build an overview of the whole class activity. In e-learning systems, there are different assessment approaches. Our approach goes beyond most current assessment practices where assessment allocates grades and is made through quizzes, multiple choice questions or numerical answers, e.g. “management of on-line questionnaire” (Avgeriou et al., 2003). We also aim to assess students' productions when they are asked to perform tasks specially designed to make the taught knowledge personally meaningful. We deal with problems where students have to perform complex cognitive operations. This type of assessment requires more than a right/wrong assessment. It requires a multidimensional analysis based on accurate pedagogical, didactical or cognitive studies of the student’s activity. In our scenario, this point can be illustrated by: Sarah is an experienced teacher. She is aware of students' personal interpretations and, from her experience or from cognitive research results, she derived a typology of students' answers to a class of problem. She wants to group her students using this typology.
The DPULS set of DPs captures experiences from a multidisciplinary and multicultural team of researchers2. It was built with a bottom-up approach to capture the participants’ expertise and to integrate European research works. In this article we present DPs in a narrative format for human readability. A machine readable version exists that is processed by a DP Browser (DPULS, 2005, 7). They are high level patterns validated by peer review and by at least the “rule of three”. Background Experiences What characterizes a set of DPs is not the originality of its content – usually most of the ideas are known since they are proven solutions to wellidentified problems. Rather, the merit of a set of DPs is to bring together and to structure existing best practices. In our work, we first generalized from six very different e-learning assessment experiences in the AIDA team3. Six projects deal with students' assessments: Combien? (Combien, 2000), Diane (Hakem, Sander, & Labat, 2005), Java Course (Duval, Merceron, Scholl, & Wargon, 2005), Logic-ITA (Merceron, & Yacef, 2004), Pépite (Pepite, 2000), Math Logs (Vandebrouck, Cazes, Gueudet, & Hersant, 2005). In this section, we present three of these experiences so that readers can give a concrete content to the DPs presented in the next section. We selected them to show a large range of assessment contexts. Combien? is a learning environment to train undergraduate students in combinatorics. Pépite is a diagnosis system that analyzes students' productions to diagnose their algebraic competence in secondary school algebra. Logic-ITA is a web based tutoring system for training second year university students in propositional Logic.
162
Delozanne, Le Calvez, Merceron, and Labat
Experience 1: The Combien? Project (Combien, 2000) Combien? trains students to solve combinatorics exercises and to justify their solution. In combinatorics (counting the number of ways of arranging given objects in a prescribed way), the main part of the solving process does not come from a clever chaining of inferences or calculations, but from the elaboration of a suitable representation and from the transformation of one representation into an equivalent one. This work is based on research that emphasizes the importance of representations in problem solving. Combien? is grounded in the constructive method (Le Calvez, Giroire, Duma, Tisseau, & Urtasun, 2003), a problem solving method set up by a group of teachers, adapted to the usual students' conceptions in order to give them access to the mathematical theory of the domain. Students are asked to build a generic element (called configuration) of the Solution Set by describing their construction as a set and a set of constraints. Then, they have to reason about this construction to find the numerical solution. Combien? offers one interface for each class of exercises. At each step of the construction the system automatically determines whether the students’ ongoing solution leads to a right construction or not. In the latter case it gives hints to help students to understand their errors. Each year, one hundred second-year university students use the system in lab sessions. Students can also work with Combien? at home for personal training. All the students’ actions (data input and validation) and their time stamping are recorded so that the session can be played again by the system. Combien? detects students’ errors, classifies them, and records their type and their characterization. All this information is recorded in XML files analyzed a posteriori. Combien? offers two types of analysis. For each student, Combien? presents a detailed account of her session. For each exercise, it reports the total solving duration, the number of errors and their type, the number of hesitations, the exercise achievement, and the number of course consultations. This analysis is available for both students and teachers. In addition, Combien? produces a classification of the exercises according to their level of difficulty and also groups students according to the type of errors they made, and their success rates to the exercises. Teachers use these classifications to define learning activities adapted to each group. Experience 2: The Pépite Project (Pepite, 2000) Pépite is an application that collects students’ answers to a set of exercises and builds a cognitive profile of their competence in algebra. It is based on an educational research that identified learning levers and obstacles in students’ algebra learning (Delozanne, Prévit, Grugeon, & Jacoboni 2003; Delozanne, Vincent, Grugeon, Gélis, Rogalski, & Coulange, 2005). A teacher gives a test to her students and Pépite provides her with three outcomes: first an overview on the whole class by grouping her students
A Structured Set of Design Patterns for Learners’ Assessment
163
Figure 1. One problem-solving step with Combien? according to identified strong and weak points, second with a detailed report of each student’s cognitive profile and third with a list of learning activities tailored to each group. Four hundred students took the Pépite test. Several classes of users used the Pépite diagnosis. Math teachers used Pépite to monitor learning activities in the classroom but also as a basis for a dialog to give a personalized feedback to a single student and as an entry to a meta-reflection on her algebraic competence. Educational researchers used Pépite to identify stereotypes of students and define appropriate teaching strategies to each stereotype. Designers used these experiments to improve the software design: collecting more students’ answers helps to strengthen the automatic diagnosis; analysing teachers’ uses helps to better understand teachers’ needs and to offer a better support for teachers’ activity. Pépite automatically collects students’ answers to the exercises. Answers are expressed by algebraic expressions, by a whole algebraic reasoning, by using students’ own words, by multiple choices or by clickable areas. The students’ assessment is a three-step process. First, each student’s answer is
164
Delozanne, Le Calvez, Merceron, and Labat
coded according to a set of 36 criteria on 6 dimensions (Figure 2): treatments (correct, incorrect, partially correct, not attempted, not coded), meaning of letters (unknown, variable, generalized number, abbreviation or label), algebraic calculation (e.g., correct usage of parenthesis, incorrect usage of parenthesis, incorrect identification of + or x, etc.), conversion (ability to switch between various representations: graphical, geometrical, algebraic, natural language), type of justifications (“proof” by example, proof by algebra, proof by explanation, “proof” by incorrect rule), numerical calculation. First, the Pepite software automatically codes 80% of the answers4. For each student, an XML file stores the students’ answers and the system coding for each exercise. A graphical interface enables the teacher to check or correct the system coding. Second, a detailed report of the student’s cognitive profile is built by collecting the same criteria across the different exercises to have a higher-level view on the student’s activity. It is expressed by success rates on three dimensions (usage of algebra, translation from one representation to another, algebraic calculation) and by the student’s strong points and weak points on these three dimensions. Third, the student’s profile is used to evaluate a level of competence in each dimension with the objective to situate the student in a group of students with “equivalent” cognitive profile. By equivalent we mean that they will benefit from the same
Figure 2. Pépite automatic coding of a student’s answer on six dimensions
A Structured Set of Design Patterns for Learners’ Assessment
165
learning activities. For instance, a student is “Usage of algebra level 1, Translation level 2, algebraic calculation level 2” when she used algebra to justify, to generalize and to formulate equations, she sometimes articulated relations between variables with algebraic expressions and linked algebraic expressions to another representation, she showed abilities in algebraic calculation in simple and well known situations but she still uses some incorrect rules.
Experience 3 The Logic-ITA The Logic-ITA (Merceron, & Yacef, 2004) is a web-based tutoring system to practice formal proofs in propositional logic. It is based on the Cognitive Load Theory – the practice of many exercises should help students to build solving problems schemata, see (Sweller, van Merrienboer, & Paas, 1998). The Logic-ITA has been used by hundreds of students from Sydney University in their second year of studies. It is offered as an extra resource to a face-to-face course. The system has a database of exercises, an exercise generator, and students can also enter their own exercises. A formal proof exercise consists of a set of formulas called premises and a special formula called the conclusion. Solving an exercise is a step-by-step process where students have to derive new formulas, using premises or already derived formulas and applying logic rules to them until they reach the conclusion. The system integrates a module with expertise on logic. It automatically evaluates each step of the student’s solution. In particular it checks that the logic rule chosen by the student is applicable and that the result of its application does match the formula entered by the student. The system provides the student with a contextualized feedback and gives her a hint in case of mistake. Students practice as they wish, training is not assessed nor marked. There is neither a fixed number nor a fixed set of exercises made by all students. For each exercise attempted by a student, the system records in a database the
Figure 3. Screenshot of the Logic-ITA while a student is about to reach the conclusion
166
Delozanne, Le Calvez, Merceron, and Labat
time and date, all the mistakes made, all the logic rules that were correctly used, the number of steps entered and whether the student has successfully or not completed the exercise. The database can be queried and mined to find pedagogically relevant information. A Structured Set of DP In this section, we present a subset of DPULS DPs on Learner’s Assessment we derived from our experience. The DPULS format was defined to build automatic browsing and searching tools. We present the DPs in a simplified format to ease the communication. Each DP has a name that describes both a problem and a solution (Alexander et al., 1977; Meszaros, & Doble, 1998). Combining all the DP names forms a pattern language; you can use the DP names to describe your design to solve problems in your context. In the Learner's Assessment category we structured our set of DPs in five hierarchies. In each hierarchy, the DP at the top deals with a problem and all the DPs below are alternative patterns to solve the same problem with different or complementary approaches. Bottom level patterns are more specialized than upper DPs. Our five hierarchies are not stand-alone; patterns can be combined to solve a design problem. Let us consider Sarah’s scenario. To provide Sarah with an overview on students' activity during the lab session, Aminata, a member of the LMS design team starts with the DP “overview of the activity of a group of learners on a set of exercises,” (DP LA4, top of the fourth hierarchy). She reads the DP solution description and finds that there are several kinds of overview (lower DPs in the fourth hierarchy) according to the user's objective when asking for an overview. She decides to get more information about the users' needs. Then, in the related patterns section, she notices that this DP has some pre-requisites. This DP needs the results of the DP “overview of a learner's activity across a set of exercises,” (DP LA2, top of the second hierarchy). Indeed the group evaluation is based on each individual’s evaluation in an individual learning system. Likewise, there are several alternative or complementary overviews on a student's activity (lower DPs in the second hierarchy) and this DP uses the results of the DP “analysis of a learner's solution on a single exercise” (DP LA1, top of the first hierarchy). This DP gives several solutions to derive information from the students' logs.
Figure 4 shows the subset of DPULS DPs focusing on Learners’ Assessment. In the following section, we detail only the patterns mentioned in the above scenario to help Aminata solve her problem. In each pattern we illustrate the results with the three AIDA experiences: Combien?, Pépite, Logic-ITA.
A Structured Set of Design Patterns for Learners’ Assessment
167
DP LA1 Multidimensional Analysis of a Learner’s Solution to a single Exercise DP LA1.1 Pattern matching to analyze the learner’s solution DP LA1.2 Specific software to analyze the learner’s solution DP LA1.3 Human assessor to check the automatic analysis of the learner’s solution DP LA2 Overview on a learner’s activity across a set of exercises DP LA2.1 The Learner’s strong and weak points DP LA 2.2 The Learner’s situation on a predefined scale of skills or competence DP LA2.3 The Learner’s Progression in an Individual Learning Activity DP LA2.4 The Learner’s Autonomy in an Individual Learning Activity DP LA2.5 The Learner’s Performance in an Individual Learning Activity DP LA3 Overview of the activity of a group of learners on a single exercise DP LA4 Overview of the activity of a group of learners on a set of exercises DP LA4.1 Automatic clustering DP LA4.2 Relations between errors, success or usage DP LA4.2.1 Association rules DP LA5 Playing around with learning resources DP LA5.1 Browsing Use of a MCQ
Figure 4. The structured set of the Learners' Assessment DPs (in bold the subset discussed here)
DP LA1 Multidimensional Analysis of a Learner’s Solution to a Single Exercise Abstract: This pattern provides several approaches to automatically assess a learner’s solution to an online solved problem. You can merely assess the correctness of the solution or enrich the assessment by other dimensions such as strategy used, abilities, hesitations, categorization of errors etc. Context: The learner answered a single question or solved a single exercise. The answer was recorded as well as usage data (time spent, actions, help requests, etc.). The local analysis of the learner’s answer can be immediate (e.g., if your system provides feedback) or delayed (e.g., in a diagnosis system). Problem: How to automatically assess a learner's solution to a problem or one step of a problem solution? Or if it is not possible, how to support human assessment? The problem is how can the system analyze, correct, comment on or classify the learner’s answer? If your system asks learners to solve complex problems, your system will let them build their own solution. In that case it is often impossible to predict the exact form of the learner’s answer because of the excessive combination of possibilities or because of learners’ cognitive diversity. Solution: If your objective is to assess the correctness of a learner’s answer then you can provide an indicator like a grade or a code for success, failure or partial failure. If your objective is to provide feedback or to have a cognitive diagnosis, you may need a deeper characterization of the learn-
168
Delozanne, Le Calvez, Merceron, and Labat
er’s answer. For instance, you may need information on the strategy used by the learner or on the skills put in evidence by the learner’s solution or on the categories of mistakes made by the learner or on the learner’s hesitations. This multidimensional characterization of the learner’s solution is often domain dependant. The analysis builds a composite indicator – it is often a set of codes identifying the learner’s solving process and a list of errors. Results: If you implement this pattern, you will characterize the learner’s answer with some of the following items: • A grade to a learner's answer to an exercise. The grade can be a mark like A, B, C , D, or a message like correct, partially correct, incorrect; • Cognitive characteristics of the learner’s answer; • A set of codes (identifying the learner’s solving process); • A list of errors. Discussion: You may need an expert (experienced teacher, a cognitive psychologist or an educational researcher) to define a model of competence, a task model and/or a typology of common errors linked to the problem to be solved by the learner. Examples • Pépite automatically computes a code to assess the answers on up to six dimensions: correctness, meaning of letters (unknown, variable, generalized number, abbreviation or label), algebraic calculation (usage of parenthesis, etc.), translation between various representations (graphic, geometric, algebraic, natural language), type of justifications (“proof” by example, proof by algebra, proof by explanation, “proof” by incorrect rule), numerical calculation (Figure 2). For instance an answer qualified as “partially correct, incorrect use of parenthesis and algebraic justification” is coded by “T2 M31 R1.” Similarly “T3 M4 R3” stands for “incorrect, incorrect identification of + and x, justification by numerical example.” • Combien? computes an immediate feedback provided to the student on the correctness or on the types of errors (on average, twenty types of errors for each of the fifteen classes of problems). Combien? also stores usage data: action timestamp, action type (e.g., student’s input, asking for help, asking for a random drawing), action parameters. • The Logic-ITA computes an immediate feedback to the student on the correctness or the error type. It also stores timestamps, a list of the errors made and a list of the rules correctly used. Related Patterns: DP LA1.1, DP LA1.2, DP LA1.3.
A Structured Set of Design Patterns for Learners’ Assessment
169
DP LA1.1 Pattern Matching to Analyze Learner’s Solution Abstract: In some types of problems a pattern matching approach can help you to assess the learner’s solution. Context: It is the same as in LA1. This solution is relevant when an analysis grid is available for this exercise, providing patterns of answers, for instance when expected answers are multiple choices, arithmetic or algebraic expressions. Problem: How to use a pattern matching approach to help you analyze a learner’s solution? Requisite: You need indicators built from a pedagogical or didactical or cognitive analysis. For instance: • A grid of correct answers. When there is a single way to express a solution in the system, an analysis grid gives correct and wrong answers. For Multiple Choice Questions, your system has to be provided with a grid of correct answers. • Pattern of good answers or common learners’ answers. When there are several ways to express a solution, a pattern gives a general model of this solution. Solution: For a class of problems, a pedagogical, cognitive or didactical analysis provides you with a set of criteria to carry out a multidimensional characterization of a pattern of solutions. Thus when you can match the learner’s answer with one pattern of solution, you know how to characterize the solution. Results: See LA1. Discussion: For open questions, it is hard work to provide patterns of solutions nevertheless it is sometimes possible. Example: A very simple example in Pépite is: if E = 6P is a good answer, the system also accepts E = 6 * P, P = E / 6 etc. Thus, E= 6*P is a pattern for a correct answer: every algebraic expression equivalent to E= 6*P is correct and assessed as “correct, translating in algebra, correct use of letters”. But P = 6 E or E + P > 6 are incorrect and assessed as “incorrect, translation by abbreviating, use of letters as labels.” Related Patterns: This pattern is a specialization of DP LA 1. DP LA1.2 Specific Software to Analyze the Learner’s Solution Abstract: Domain specific software can be used to help you analyze a learner’s solution. Context: It is the same as LA1. In some specific domains like programming language, formal logic, mathematics, specific software (e.g., a compiler, a problem solver, an expert system, a Computer Algebra System) assesses the correctness and eventually gives information about errors. Problem: How to use specific software to analyze a learner’s solution? Solution: You can use the specific application to check the accurateness
170
Delozanne, Le Calvez, Merceron, and Labat
of the solution until the solution is totally correct. Some applications provide error messages and thus you can use them to characterize errors. Results: See LA1. Examples: Combien? and Logic-ITA use an expert system: • In Combien? the learner builds a solution to a combinatorics exercise. Each step of the solution is analyzed by the system using a “targeted detection of error” (Giroire, Le Calvez, Tisseau, Duma, Urtasun, 2002). If the learner’s construction cannot lead to a correct solution, Combien? provides hints to help the student achieve a correct solution. • In the Logic-ITA, an exercise is a formal proof. The learner has to derive the conclusion from the premises using logical rules and producing intermediary formulas. The expert system checks whether each intermediary formula entered by the learner is correct and provides appropriate feedback. In case of a mistake, a hint is given to help the learner correct the mistake and enter a correct formula. The mistake is stored by the system. Otherwise the system stores the correct use of the logic rule. Related Patterns: This pattern is a specialization of DP LA 1.
DP LA 1.3 Human Assessor to Check the Automatic Analysis of the Learner’s Solution Abstract: Either the teacher herself assesses the answer, or the teacher completes, or verifies, or corrects the system's assessment of the learner’s answer. Context: It is the same as LA1. This solution is relevant if the learners’ solution is composite, or if learners answered in their own words, or if the diagnosis expertise is not yet formalized, or if it is for teacher’s training purpose. Problem: How to assess the learner’s solution when the automatic diagnosis failed or has a low level of confidence? Solution: Your system provides a human assessor with a list of exercises where the automatic analysis failed or is not 100% reliable. Then it provides an interface to assist the human assessor. Results: See DP LA1. Discussion: If a large number of learners are enrolled in your course or if the teachers are very busy (and it is often the case) this solution is unrealistic because it is time consuming. But it is successful if you need a very accurate assessment of individual learners in a research context or a teacher development context for example. Examples: Pépite does not fully assess the solution when learners use natural language and it provides a software tool in order to allow teachers to correct, verify or complete the automatic diagnosis. Related Patterns: This pattern is a specialization of DP LA1
A Structured Set of Design Patterns for Learners’ Assessment
171
DP LA 2: Overview of a Learner’s Activity Across a Set of Exercises Abstract: This pattern offers several approaches to provide different stakeholders with a general view of a learner’s work across a set of exercises or a whole course. Context: The learner is asked to solve complex problems in an individual learning system or a diagnosis system. In both cases, the system collects the learner’s answers and assesses them. The objectives of this assessment are various, for example to group learners for remediation, to make learners aware of their strong points and weaknesses or to situate themselves on a predefined scale of competence. Problem: The problem is: how can the system give a general view of the learner’s work successes, failures or cognitive profile? Strategic decisionmaking requires a high level description of a learner’s activity. For instance, in order to tailor learning situations, teachers need a synthetic view on the learner's learning activity or an account of the learner’s evolution. A classification of the learner on a predefined scale of competence may be useful to organize working groups or for learners to situate themselves according to expected skills. Thus, you want to define the main features that summarize the learner’s competence. In a diagnosis system, the general view is an instantaneous picture of the learner’s competence, in a learning environment the learner’s evolution over time can be analyzed. Requisite: See DP LA 1 results. Solution: To solve this problem you may collect the learner’s answers to a set of questions, exercises or to a whole course. You must first carry out the analysis of each answer on every exercise. Then, you define the different dimensions you need (with teachers or researchers) and, finally, you build an overview according to your stakeholders’ objectives. For example, one may decide to determine the learner’s strong points and weaknesses, or to situate the learner on a scale of competence, or to report on the evolution of a particular skill during the course. Results: A synthetic analysis of the learner’s competence. Discussion: It is crucial to define the dimensions of the overview and to pay careful attention to your stakeholders’ needs and how they will use the overview. Examples: • Combien? and Logic-ITA summarize items such as the global time spent, the number of exercises attempted, succeeded, failed, the errors made, the rules correctly used etc.; • Pépite builds a cognitive profile of the learner’s competence in algebra (see DP LA 2.1 results). Related Patterns: DP LA1, DP LA2.1
172
Delozanne, Le Calvez, Merceron, and Labat
DP LA. 2.1 The Learner’s Strong and Weak Points Abstract: To highlight strong points and weak points in a learner’s performance on a set of exercises is a frequent way to build an overview of a learner’s work. Context: It is the same as LA2. The learner was asked to solve several exercises involving different skills. Problem: How to define a learner’s strong points and weaknesses? Requisite: See DP LA 1 results. Solution: After assessing each answer to each exercise, you can have a cross analysis on a whole session or on a whole period of time. First, you calculate the success rates for each skill involved in the set of exercises (or in the course). Then, for each skill or error, you calculate the number of occurrences. Sometimes you may also need the context where the skills were obvious or where errors occurred. In that case you need a categorization of the exercises or a categorization of errors. Results: • List of skills and success rates on these skills; • List of errors and number or context of these error occurrences. Discussion: It is very important to have both strong points and weaknesses, and not only the learner’s errors. If you make a report, highlighting strong points encourages the student, and even the teacher. If you want to help the student to progress, teaching strategies may be different according to the learner’s mastered skills. Example: Pépite offers an overview by providing a level on three dimensions (usage of algebra, translation from a representation to another, algebraic calculation) and, for each dimension, it provides strong points (a list of mastered skills with success rates and the context where they become obvious). It also provides weak points (a list of categorized errors with their frequency and contexts of occurrences). Related Patterns: This pattern is a specialization of DP LA2. DP LA. 4 Overview of the Activity of a Group of Learners on a Set of Exercises Abstract: This pattern provides teachers with a general view of a group of learners’ work on a set of exercises. Context: A set of exercises, either linked to a specific course or composing a diagnosis system, has been completed by a group of learners. Problem: How can your system produce a general view of a group of learners' work on a whole set of exercises? Strategic decision-making may require a synthetic view of learners' work on a whole activity. For instance, this view can help to organize work groups in a classroom. It can help teachers to reflect about the quality and adequacy of exercises and course mater-
A Structured Set of Design Patterns for Learners’ Assessment
173
ial for example if it shows that some mistakes co-occur. It provides instructional designers or teachers with information that could help them improve their teaching or their design (See DP MV1). Requisite: See DP LA1 Results. Solution: If your objective is to group learners by abilities with respect to a grid of competence or with respect to learning objectives, then you will produce a map of the class. This map will be a classification of learners into different groups. You may predefine stereotypes. A stereotype is a means to identify groups of students who have the same characteristics. These characteristics can be as simple as a high mark, an average mark and a low mark. Or a stereotype may be more accurate and may describe learners who master skills C1, C2 and C3 and who do not master skills C4 and C5. Then you map each learner in the group she belongs to according to the analysis results you have got for each exercise and the relation you have defined between exercises and skills. You may also let the system find groups for you. This may be done by using an automatic clustering algorithm from the Data Mining field. If your objective is to get an overview of learners’ performance, you can produce statistics or charts that group exercises by chapters or abilities. If your objective is to detect associations such as 'if learners make mistake A, then they will also make mistake B,' or 'if learners fail on exercise E, then they will also fail on exercise F', then you may use an association rule algorithm from the Data Mining field. Results: If you implement this pattern, you will characterize the activity of your group of students with some of the following items: • A grade book for the whole class and statistics on all students’ performance; • A map of the class, grouping learners by abilities; • Links between mistakes that often occur together, between exercises often failed or succeeded together. Discussion: Stereotypes can be very simple (low achieving, regular, high achieving students), multidimensional (ranking students on a multidimensional scale for instance in Second Language) or describing usage (player, systematic learner, butterfly, etc.) Examples: • In Pépite, stereotypes are used to classify learners by group of abilities and to offer a cognitive map of the class, grouping students by stereotypes. • With the Logic-ITA, automatic clustering is used to find out whether failing learners can be split into different groups for better remediation. Related Patterns: This pattern is specialized by DP LA4.1 and DP LA4.2. If your objective is to improve the course, see the DP MV1 hierarchy (DPULS, 2005, 6).
174
Delozanne, Le Calvez, Merceron, and Labat
DP LA 4.1 Automatic clustering Abstract: Use automatic clustering to get an overview of a group of learners’ activity on a set of exercises. Context: It is the same as LA4. When you have a characterization of each learner by a set of attributes like the exercises she passed or failed, the mistakes made, the abilities she masters or lacks, then you can use automatic clustering to split the whole group of learners into different homogeneous groups. Problem: How to get an overview of a group of learners using automatic clustering? You want to organize groups in a classroom. Whether it is better to work with homogeneous or heterogeneous groups is a pedagogical decision. In both cases, you have first to build homogeneous groups; for example, a group should be formed with learners who are similar for some characteristics. However, you do not have any predefined stereotypes, or you want to explore whether some other grouping would be sensible. You may try automatic clustering. Requisite: As for LA4. Solution: For each learner you have some analysis provided by the system on each answer submitted for each exercise. You concatenate these analyses, or you summarize them to obtain a characterization for each learner. For example, each learner can be characterized by the fail or pass obtained on each exercise. Another simple characterization would be the total number of mistakes made on all attempted exercises. From these characteristics, you select the ones that should be used for the automatic clustering algorithm and you run it. The automatic clustering algorithm produces clusters of learners. All learners belonging to one cluster are similar with respect to the characteristics that you have chosen to run the algorithm. Results: Thus you will obtain a map of the class with students grouped in clusters. Discussion: To use automatic clustering, you need some expertise in the Data Mining field. Choosing the right characteristics to run the algorithm and to interpret the resulting clustering is a crucial and difficult point. Example: In the Logic-ITA experience, we have used automatic clustering to explore whether failing learners can be grouped in homogeneous clusters. Failing learners are learners who attempt exercises without completing them successfully. As a characteristic we have used the number of mistakes made. The result was two clusters – learners making many mistakes and learners making few mistakes. In the cluster of learners making many mistakes, one could identify learners with a “guess and try” strategy, using logic rules one after the other till they hit the right rule to solve an exercise. Related Patterns: This pattern is a specialization of DP LA4.
A Structured Set of Design Patterns for Learners’ Assessment
DISCUSSION
175
How to evaluate a set of DPs? This is a hot issue in DP communities (Avgeriou et al., 2004; Buschmann et al., 1996; Chung et al., 2004; Salingaros, 2000; E-LEN project; Todd, Kemp, & Philips, 2004). Most Pattern sets described in the literature are validated through a peer review process. Some authors suggest that a pattern can be accepted by a community if it is used in three experiences other than the one that proposed the pattern (Buschmann et al., 1996). In this section, we discuss how our DP set accounts for solutions adopted in other experiences. We checked the validity of our DPs in a three step process. In a first step, we discussed the patterns within the AIDA team. In a second step, the DPs have been discussed and refined within the DPULS consortium. In a third step, we looked at the AIED’05 workshop on “Usage Analysis” (Choquet, C., Luengo, V. & Yacef, K., 2005) contributions that deal with assessment to investigate whether these experiences match our approach. We worked out the above patterns to generalize some success stories in the AIDA team namely the three systems presented in the second section and three others: Diane, Math Logs and Java Course. Diane is a diagnosis system. It identifies adequate or erroneous strategies, and cognitive mechanisms involved in the solving process of arithmetic problems by children from elementary schools (8-10 years old) (Hakem, Sander, & Labat, 2005). DP LA1 and DP LA2 account for Diane’s approach to Cognitive Diagnosis. Math Logs provides researchers with information about undergraduate students’ performance on mathematical exercises displayed on the Wims platform. The mathematical expressions entered by learners are checked by Computer Algebra Systems (CAS) as MUPAD, PARI, Maxima (accounted by DP LA 1.2). Math Logs displays to teachers average grades, average time spent by type of exercises, and indicators on the evolution of grades over time. It detects students’ usage strategies such as focussing on easier exercises or preferring challenging exercises (DP LA2, LA3 and LA4). The Java Course is an online course for introductory programming in Java. Students write programs and submit them for compilation and execution. Their programming and compiling errors are stored in a Database (DP LA1.2). For each exercise and each student, the system displays whether the exercise has been attempted, passed, the number of mistakes made and provides an overview for a chapter and for the whole set (DP LA2). From the AIED’05 workshop on Usage Analysis of Learning Systems (Choquet et al. 2005), we selected six papers dealing with assessment. Kumar (2005) presents a C++ Programming Tutor. In this system, each exercise involves exactly one learning objective. The system automatically checks whether the student’s answer is correct, partially correct, incorrect, missed or not attempted (DP LA1.1). Then an overview of each student’s work is produced.
176
Delozanne, Le Calvez, Merceron, and Labat
This overview gives for each learning objective the fraction of exercises solved correctly, partially correctly etc. (DP LA2). For a whole class, the average performance for each exercise is computed (DP LA3) and, finally, the average performance of the class on each learning objective is also calculated (DP LA4). Designers use the latter to refine the templates to automatically generate the exercises according to the objectives (DP MV1 “Material Improvement”). Feng and Heffernan (2005) describe Assistment, a math web-based tutor that provides the teacher with a report on the student’s activity and the student with assistance when she is stuck in solving problems. This assistance is given by scaffolding questions that are intended to help the student but also to help the diagnosis system to understand which knowledge components are involved in the failure when the problem involves more than one component (DP LA1). Then the system builds a grade book providing the teacher with each student’s results (DP LA2) and with an overview of the class (DP LA 4). Another overview of the class is given by a Class Summary Report and a Class Progress Report. Then an analysis of items helps determine what the difficult points are for students and improve both teaching and materials (DP MV1 “Material Improvement”). This system design is a very clever example of the best practices we wanted to account for in our DP Set. Nicaud, Chaachoua, Bittar, & Bouhineau, (2005) model the student‘s behavior in algebra calculation in Aplusix, a system to learn elementary algebra. The diagnosis is a two step process. The first phase is a local diagnosis of each student's transformation of an algebraic expression. From a library of correct and incorrect rules, the system determines the sequence of rules used by students during the transformation. The diagnosis system also characterizes the type of exercise and the algebraic context in a “Local Behavior Vector (LDV)” (DP LA1, for one step). The second phase uses a lattice of conceptions to build an overview on a student’s conceptions from the different LDV on a set of algebraic transformations (DP LA2). Aplusix offers to teachers a map of conceptions for the whole class (DP LA4). Heraud, Marty, France, & Carron, (2005), Stefanov & Stefanova (2005), and Muehlenbrock (2005) do not focus on learners’ assessment. They describe approaches to build an overview of a learner from different perspectives. They work with a trace of a student’s actions in an e-learning platform. The trace records success and task completion on exercises, time stamp and course consultation or helps (DP LA 1). Stefanov & Stefanova (2005), use a learning object approach in Virtuoso to provide students, teachers and designers with statistics on students’ weak points and strong points (DPLA 2.1), on success rates or failures to each learning objects (DP LA3) to improve the material (DP hierarchy MV). Muehlenbrock (2005) uses decision trees to classify students in three categories: low, medium or high achieving students (DP LA4). To have an overview on the student’s activity on a session (DP LA2) the system described by Heraud et al. (2005)
A Structured Set of Design Patterns for Learners’ Assessment
177
compares the student’s trace to the prescribed learning scenario. It displays the trace in a band showing the time spent for each task along with the prescribed time. The system shadows the prescribed line to provide what they call “a shadow bar” when there is a discrepancy between the trace and the prescribed time. A human assessor called “trace composer” when aware of this shadow zone in the trace can investigate other sources of information like the student’s logs in the server, the learner’s workstation or human observer. This experience suggests introducing “a human assessor” pattern in DP LA2 hierarchy. It also suggests to refining the hierarchy with usage data while so far we have focused on accurate cognitive information. This review shows that our approach accounts for solutions adopted in many learning systems at least in the domains of Programming and Mathematics Tutors. Further investigation is needed to see whether our DPs are pertinent for other domains, though it seems that it is the case. For instance Language Standardized Tests (e.g., TOEFL, TOEIC, IELTS5, TFI) use an approach similar to our DPs. On the basis of students’ scores, they classify students on a predefined scale of competence in listening, reading, writing and speaking skills. This fits well with the multidimensional analysis of DP LA1. In this section we discussed whether our DP collection on Learner’s assessment matches the state of art in the domain. Our objective was to capitalize on success stories and to help designers build successful e-learning systems. We estimate that we validated the first point but it is premature to validate the second. So far only our students used it. For technical reasons our DP set will be public soon on the Web. In particular it will be used in the Virtual Doctoral School of The Kaleidoscope Network6. This will provide interesting feedback and hopefully the Pattern Language will be enriched. CONCLUSION
In this article, we focussed on DPs dealing with Learners’ Assessment, a subset of the DP elaborated in the DPULS consortium (DPULS, 2005, 6). One outcome of our work is to put in evidence basic principles for designers whose objective is to provide teachers with an overview of the activity of a group of learners on a set of exercises, (DP LA4). A fundamental step is to obtain a multidimensional analysis of a learner's solution on a single exercise, (DP LA1). DP LA1 is specialized by other DPs depending on how the student’s solution is analyzed (pattern matching, use of some specific software, human assessment). It can be used to obtain both an overview of a learner's activity across a set of exercises, DP LA2, and an overview of the activity of a group of learners on a single activity, DP LA3. The two latter DPs are used by DP LA4. In case DP LA1 is applied to exercises that are characterized by precisely defined skills, DP LA4 gives a way to track students' problem-solving abilities.
178
Delozanne, Le Calvez, Merceron, and Labat
These DPs are high level DPs. A future work will consist in writing more specific DPs to describe how to implement the different solutions. To support LA4 we would like to work out patterns on data mining techniques. A complementary approach to build an overview is used by Pépite, Aplusix, Assistment or Second Language assessment. It consists in situating a student in predefined classes based on cognitive or educational research and not only on achievement. Further investigation is needed to study what part of the assessment is domain independent in order to be implemented in e-learning platforms. On the opposite, we think that domain specific patterns would give more accurate guidance to designers. For instance, our DP set could be completed with patterns that collect practices on displaying assessment results to different actors in specific domains. So far this work has demonstrated that DP is a very fruitful approach to generalize from different design experiences and to capitalize on them. It was a powerful integrative project because we had to distill out of our specific projects what was invariant in making good design solutions. “Patterns are very much alive and evolving” (Alexander et al., 1977). We hope that we will have feedback from practitioners to develop and enrich this first DP language on learners’ assessment. References Alexander, C., Ishikawa, S., Silverstein, M., Jacobson, M., Fiksdahl-King, I., & Angel, S. (1977). A pattern language: Towns, buildings, construction. New York: Oxford University Press. Avgeriou, P., Papasalouros, A., Retalis, S., & Skordalakis, M., (2003). Towards a pattern language for learning management systems. Educational Technology & Society, 6(2), 11-24. Avgeriou, P., Vogiatzis, D., Tzanavari, A., & Retalis, S., (2004). Towards a pattern language for adaptive web-based educational systems. Advanced Technology for Learning, 1(4), ACTA Press,. Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P. & Stal, M., (1996). Pattern-oriented software architecture. Volume 1: A System of Patterns. John Wiley & Sons. Choquet, C., Luengo, V. & Yacef, K., Eds. (2005). Proceedings of "Usage Analysis in Learning Systems" workshop, held in conjunction with the 12th Conference on Artificial Intelligence in Education, Amsterdam, The Netherlands, Retrieved December 2005 at :http://hcs.science.uva.nl/AIED2005/W1proc.pdf Choquet, C., (2004). JEIRP design patterns for recording and analysing usage of learning systems proposal. Retrieved March 2006 from http://www.noe-kaleidoscope.org/pub/activities/jeirp/activity.php?wp=33 Chung, E.S., Hong, J. I., Lin, J., Prabaker, M.K., Landay, J.A., & Liu, A. (2004). Development and evaluation of emerging design patterns for ubiquitous computing. In Proceedings of Designing Interactive Systems (DIS2004), Boston, MA. pp. 233-242. Combien (2000). The Combien? Project site, http://www.lip6.fr/combien/
A Structured Set of Design Patterns for Learners’ Assessment
179
Delozanne, E., Prévit, D., Grugeon, B., & Jacoboni, P. (2003). Supporting teachers when diagnosing their students in algebra. Workshop Advanced Technologies for Mathematics Education, supplementary Proceedings of Artificial Intelligence in Education, Sydney, July 2003, IOS Press, Amsterdam, 461-470. Delozanne, E., Vincent, C., Grugeon, B., Gélis, J.-M., Rogalski, J., & Coulange, L. (2005). From errors to stereotypes: Different levels of cognitive models in school algebra, Proceedings of E-LEARN05, Vancouver, 24-28 October 2005. Deng, J., Kemp, E., & Todd, E. G., (2005). Managing UI pattern collections. CHINZ’05, Auckland, NZ. DPULS 2-7 (2005). Deliverables. Retrieved January 2006 from http://www.noekaleidoscope.org/intra/activity/deliverables/ deliverable 2: Merceron, A. Report on partners’ experiences. deliverable 3: David, J.P. State of art of tracking and analyzing usage. deliverable 4: Pozzi, F. The set of recurrent problems and description of solutions. deliverable 5: Verdejo, M.F., & Celorrio, C. The design pattern language structure. deliverable 6: Delozanne, E., Labat, J.-M., Le Calvez F., & Merceron, A. The structured set of design patterns. deliverable 7: Iksal, S., Alexieva, A., Beale, R., Byrne, W., Dupont, F., Londsdale, P., & Milen, P. The design pattern browser. Duval, P., Merceron, A., Scholl M., & Wargon, L., (2005). Empowering Learning Objects: an experiment with the Ganesha platform. In Proceedings of the World Conference on Educational Multimedia, Hypermedia and Telecommunications ED-MEDIA 2005, Montreal, Canada, P. Kommers and G. Richards Ed., 2005(1), pp. 4592-4599, AACE Digital Library (http://www.aace.org). E-LEN (n.d.). the E-LEN patterns site, http://www2.tisip.no/E-LEN/patterns_info.php (consulted December 2005). Feng, M., & Heffernan, N, (2005). Informing teachers live about student learning: Reporting in assistment system. Proceedings of "Usage Analysis in Learning Systems" workshop, held in conjunction with the 12th Conference on Artificial Intelligence in Education, Amsterdam, The Netherlands, pp.25-32. Retrieved December 2005 at http://hcs.science.uva.nl/AIED2005/ W1proc.pdf Giroire, H., Le Calvez, F., Tisseau, G., Duma, J., & Urtasun, M., (2002). Targeted Detection: Application to Error Detection in a Pedagogical System, Proceedings of ITS'2002, Biarritz, p.998. Goodyear, P., Avgeriou, P., Baggetun, R., Bartoluzzi, S., Retalis, S., Ronteltap, F., & Rusman, E., (2004). Towards a pattern language for networked learning. Proceedings of Networked Learning 2004, Lancaster UK. Hakem, K., Sander, E., & Labat, J-M., (2005). DIANE, a diagnosis system for arithmetical problem solving. Proceedings of Artificial Intelligence in EDucation 2005, Looi, McCalla, Bredeweg, Breuker eds, IOS Press, Amsterdam, Holland, pp. 258-265. Heraud, J. M., Marty, J. C., France, L., & Carron, T., (2005). Helping the Interpretation of Web Logs: Application to Learning Scenario Improvement, Proceedings of "Usage Analysis in Learning Systems" workshop, held in conjunction with the 12th Conference on Artificial Intelligence in Education, Amsterdam, The Netherlands, pp. 41-48. Retrieved December 2005 at http://hcs.science.uva.nl/AIED2005/W1proc.pdf
180
Delozanne, Le Calvez, Merceron, and Labat
Kumar, A., (2005). Usage Analysis in Tutors for C++ Programming, Proceedings of "Usage Analysis in Learning Systems" workshop, held in conjunction with the 12th Conference on Artificial Intelligence in Education, Amsterdam, The Netherlands, pp.57-64. Retrieved December 2005 at :http://hcs.science.uva.nl/AIED2005/W1proc.pdf Le Calvez, F., Giroire, H., Duma, J., Tisseau, G., & Urtasun, M. (2003). Combien? A software to teach learners how to solve combinatorics exercises. Workshop Advanced Technologies for Mathematics Education, supplementary Proceedings of Artificial Intelligence in Education, Sydney, Australia, IOS Press, Amsterdam, pp. 447-453. Merceron, A., & Yacef, K., (2004). Mining student data captured from a web-based tutoring tool: Initial exploration and results. in Journal of Interactive Learning Research Special Issue on Computational Intelligence in Web-Based Education, 15(4), 319-346. Meszaros, G., & Doble, J., (1998). A pattern language for pattern writing. In Pattern Languages of Program Design 3 (Software Patterns Series), Addison-Wesley, Retrieved December 2005 at http://hillside.net/patterns/writing/patterns.htm Muehlenbrock, M., (2005). Automatic action analysis in an interactive learning environment. Proceedings of "Usage Analysis in Learning Systems" workshop, held in conjunction with the 12th Conference on Artificial Intelligence in Education, Amsterdam, The Netherlands, pp.7380. Retrieved December 2005 at http://hcs.science.uva.nl/AIED2005/W1proc.pdf. Nicaud, J.-F., Chaachoua, H., Bittar, & M., Bouhineau, D. (2005). Student’s modelling with a lattice of conceptions in the domain of linear equations and inequations. Proceedings of "Usage Analysis in Learning Systems" workshop, held in conjunction with the 12th Conference on Artificial Intelligence in Education, Amsterdam, The Netherlands, pp.81-88. Retrieved December 2005 at :http://hcs.science.uva.nl/AIED2005/W1proc.pdf. Pepite (2000). The Pépite Project site, http://pepite.univ-lemans.fr PPP (PPP, n.d.). The Pedagogical Patterns Project site, http://www.pedagogicalpatterns.org/ (consulted December 2005) Salingaros, N., (2000). The structure of pattern languages. Architectural Research Quarterly, 4, 149-161. Schmidt, D., Stal, M., Rohnert, & H., Buschmann, F., (2000). Pattern-oriented software architecture, vol.2 . Patterns for Concurrent and Networked Objects. Wiley. Stefanov K., & Stefanova E., (2005). Analysis of the Usage of the Virtuoso System, Proceedings of "Usage Analysis in Learning Systems" workshop, held in conjunction with the 12th Conference on Artificial Intelligence in Education, Amsterdam, The Netherlands, pp. 97-104. Retrieved December 2005 at :http://hcs.science.uva.nl/AIED2005/W1proc.pdf Sweller, J., van Merrienboer, J. G., & Paas, F. G. W. C., (1998). Cognitive architecture and Instructional Design. Educational Psychology Review, 10(3), 251-295. Todd, E., Kemp, E., & Philips, C., (2004). What makes a good user interface pattern language? The 5th Australasian User Interface Conference, Dunedin, Australian Computer Society, pp. 91-100. Vandebrouck, F., Cazes, C., Gueudet, G., & Hersant, M., (2005). Problem solving and web resources at tertiary level. Proceedings of the 4th Conference of the European society for Research in Mathematics Education, CERME 2005, Barcelone Spain. van Duyne, D. K, Landay, J., & Hong, J., (2003). The design of sites: Patterns, principles and process for crafting a customer centered web experience. Addison-Wesley. van Welie, M.. (2004). Patterns in interaction design. Retrieved March 2006 from http://www.welie.com/index.html.
A Structured Set of Design Patterns for Learners’ Assessment
181
Notes http://www.noe-kaleidoscope.org/ (consulted March 2006) The DPULS consortium involved researchers and practitioners from six European countries and from different domains: educational design, educational research, teachers, engineers, LMS managers and LMS designers. 3 Aida is a consortium of Research Laboratories focusing on e-learning problems in the Paris area. The French Ministry of Research funds it. Aida belonged to the DPULS consortium. 4 It codes every answer expressed by multiple choices and by one algebraic expression, most answers expressed by several algebraic expressions, and some answers in students’ own words. 5 International Language Testing System 6 Kaleidoscope Virtual Doctoral school: http://www.noe-kaleidoscope.org/pub/activities/backbone/activity. php?wp=79 (consulted March 2006) 1 2