You are here

Automatic Behavioral Analysis

In the context of CHIL project, we investigated the scenario of teamwork. We developed a component that automatically detects the functional roles of the participants in a small group interaction. In this scenario, we investigated the usefulness and the acceptability of a functionality inspired by the practice of coaching. It consists of a report about the relational/social behavior of individual participants, which is generated from multimodal information, and privately delivered after the end of the meeting. In the context of a user-centred design, we first of all investigated the acceptability of such a service with a qualitative study (focus groups) and with a Wizard-of-Oz experiment. The results of these studies suggested that this service might be accepted by clerical users although technical people may be biased by a system that makes judgments (see Pianesi et al. 2006)

To implement such functionality, an automatic system should be able to observe the meeting as a coach would; that is, by abstracting over low level (visual, acoustic, etc.) information to produce medium/coarse-grained one about the relational roles that members play in the group. Building on Bales' Interaction Process Analysis and drawing on observations performed on a set of face-to-face meetings, a new coding scheme was produced (Falcon et al. 2005).

In order to train a machine learning system to automatically code this kind of relation behavior from these features, we collected a corpus of eleven interactions of four persons involved in a decision-making task. Some low level features were automatically extracted from the acoustical and visual scene analysis: voce activation and fidgeting. The time dimension was taken into account by a sliding windows technique: the classifier works on all the data comprised in the time window to assign a Task area role and a Socio-Emotional area one only at the end of the window. Windows of varying size were considered. Role assignment was modeled as a multiclass-classification problem on a relative large and very unbalanced dataset. A Support Vector Machines with a bound-constrained SV classification algorithm with a RBF kernel was used as classifier. The performance of the classification for the Task area roles is rather good with an overall accuracy of 90%; macro precision of 0.92 and macro recall of 0.80 with a left window of 9 seconds. The results are comparable for the Socio-Emotional area roles with an accuracy of 0.92, macro precision of 0.89 and macro recall of 0.80; again with a left window of 9 seconds. All the scores tend to become smaller with longer windows (Zancanaro et al. 2006).

Another, more  application of these techniques was in the context of domitics (see Netcarity Project) The automatic recognition of socio-emotional behavioral patterns in elders living alone is driven by the goal of providing services and functionalities motivating a prolonged independent living at their homes. The research  focused on both the development of robust multimodal components, and on investigating the acceptance of the technology and services by elder people in their home environmnet.


The goal of this activity is to model human behavior in different settings and to build systems able to understand human behavior from simple features extracted from the acustical and visual scene analysis. This may open new possibilities for building a next generation of context-aware systems that range from well-being monitoring in domotic systems to automatic coaching in teamwork. Yet, it may also raise new issues such as the users' acceptability of being monitored. From a human-centred perspective, it may be questioned how the system's capabilities, i.e., its perceptual bandwidth, affects the user experience. Alongside the technical issues of building components able to detect behavior in complex environments, we also focus on usability and acceptability of services based on multimodal monitoring of users.