- Proceedings Exploring Dynamic Parameters for Vietnamese Gender-Independent ASR
 S. Leang, É. Castelli, D. Vaufreydaz and S. Sam
 The 14th Conference on Information Technology and Its Applications (CITA 2025), Phnom Penh, Cambodia, Cambodia, July 2025
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{leang:hal-05190765,
  title = {{Exploring Dynamic Parameters for Vietnamese Gender-Independent ASR}},
  author = {Leang, Sotheara and Castelli, {\'E}ric and Vaufreydaz, Dominique and Sam, Sethserey},
  booktitle = {{The 14th Conference on Information Technology and Its Applications (CITA 2025)}},
  hal_version = {v1},
  hal_id = {hal-05190765},
  pdf = {https://hal.univ-grenoble-alpes.fr/hal-05190765v1/file/paper.pdf},
  keywords = {speech dynamics ; acoustic gesture ; gender-independent speech recognition ; tonal and low-resource language},
  month = {July},
  year = {2025},
  address = {Phnom Penh, Cambodia, Cambodia},
  url = {https://hal.univ-grenoble-alpes.fr/hal-05190765},
  abstract = {The dynamic characteristics of speech signal provides temporal information and play an important role in enhancing Automatic Speech Recognition (ASR). In this work, we characterized the acoustic transitions in a ratio plane of Spectral Subband Centroid Frequencies (SSCFs) using polar parameters to capture the dynamic characteristics of the speech and minimize spectral variation. These dynamic parameters were combined with Mel-Frequency Cepstral Coefficients (MFCCs) in Vietnamese ASR to capture more detailed spectral information. The SSCF0 was used as a pseudo-feature for the fundamental frequency (F0) to describe the tonal information robustly. The findings showed that the proposed parameters significantly reduce word error rates and exhibit greater gender independence than the baseline MFCCs.},
}The dynamic characteristics of speech signal provides temporal information and play an important role in enhancing Automatic Speech Recognition (ASR). In this work, we characterized the acoustic transitions in a ratio plane of Spectral Subband Centroid Frequencies (SSCFs) using polar parameters to capture the dynamic characteristics of the speech and minimize spectral variation. These dynamic parameters were combined with Mel-Frequency Cepstral Coefficients (MFCCs) in Vietnamese ASR to capture more detailed spectral information. The SSCF0 was used as a pseudo-feature for the fundamental frequency (F0) to describe the tonal information robustly. The findings showed that the proposed parameters significantly reduce word error rates and exhibit greater gender independence than the baseline MFCCs. 
- Proceedings Teachers’ Strategies: Collaborative Learning in Second Language Classroom
 D. Vilkova, J. Burkhardt and F. Jambon
 18th International Conference on Computer-Supported Collaborative Learning (CSCL) 2025, pp. 460-464, Helsinki, Finland, June 2025
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{vilkova:hal-05168321,
  title = {{Teachers' Strategies: Collaborative Learning in Second Language Classroom}},
  author = {Vilkova, Daria and Burkhardt, Jean-Marie and Jambon, Francis},
  booktitle = {{18th International Conference on Computer-Supported Collaborative Learning (CSCL) 2025}},
  hal_version = {v1},
  hal_id = {hal-05168321},
  pdf = {https://hal.science/hal-05168321v1/file/paper_1285%20%282%29.pdf},
  keywords = {Collaboration ; Teaching strategies ; Collaborative Second Language Learning ; Second language acquisition SLA ; Evaluation ; Teachers and educators},
  doi = {10.22318/cscl2025.597512},
  month = {June},
  year = {2025},
  pages = {460-464},
  address = {Helsinki, Finland},
  url = {https://hal.science/hal-05168321},
  abstract = {The study explores strategies teachers use to design, monitor, manage, and evaluate collaborative group activities in second language learning within higher education. Twenty second language teachers from diverse educational contexts and countries participated in semi-structured interviews. A total of 68 Critical Incidents were obtained. We elaborated a coding schema and analyzed critical incidents focusing on task objectives, teacher actions, student interactions, and evaluation outcomes. Teachers found that culturally relevant tasks content led to greater student engagement, while challenges such as high learning objectives, unclear instructions, and imbalanced group participation affected group work and task success. The study emphasizes the importance of balancing group size, task complexity, and cultural context in collaborative second language learning. The findings attempt to bridge theoretical and methodological foundations of collaborative learning with strategies applied in the classroom.},
}The study explores strategies teachers use to design, monitor, manage, and evaluate collaborative group activities in second language learning within higher education. Twenty second language teachers from diverse educational contexts and countries participated in semi-structured interviews. A total of 68 Critical Incidents were obtained. We elaborated a coding schema and analyzed critical incidents focusing on task objectives, teacher actions, student interactions, and evaluation outcomes. Teachers found that culturally relevant tasks content led to greater student engagement, while challenges such as high learning objectives, unclear instructions, and imbalanced group participation affected group work and task success. The study emphasizes the importance of balancing group size, task complexity, and cultural context in collaborative second language learning. The findings attempt to bridge theoretical and methodological foundations of collaborative learning with strategies applied in the classroom. 
- Proceedings Can Eye-Tracking Accurately Detect Human Intentions in Assembly Tasks to Enhance Human-Robot Interaction?
 P. André, M. Grand, D. Pellier and F. Jambon
 PETRA ’25: The PErvasive Technologies Related to Assistive Environments, pp. 110-119, Corfu Island, Greece, June 2025
  DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{andre:hal-05171431,
  title = {{Can Eye-Tracking Accurately Detect Human Intentions in Assembly Tasks to Enhance Human-Robot Interaction?}},
  author = {Andr{\'e}, Paul and Grand, Maxence and Pellier, Damien and Jambon, Francis},
  booktitle = {{PETRA '25: The PErvasive Technologies Related to Assistive Environments}},
  hal_version = {v1},
  hal_id = {hal-05171431},
  keywords = {human intentions prediction ; head-mounted eye-tracker ; remote eye-tracker ; assembly tasks ; earliness precision trade-off},
  doi = {10.1145/3733155.3733191},
  month = {June},
  year = {2025},
  pages = {110-119},
  publisher = {{Association for Computing Machinery}},
  address = {Corfu Island, Greece},
  url = {https://hal.univ-grenoble-alpes.fr/hal-05171431},
  abstract = {In industrial environments, human-robot collaboration (cobotics) has evolved to enhance efficiency without replacing human operators. In this context, real-time prediction of human intent, closely tied to specific physical actions, is essential for cobots to interact with humans in a safe, efficient, and seamless manner. In this paper, we explore the use of eye-tracking technology to infer human intentions during the execution of a simulated assembly task. By utilizing remote or head-mounted eye-tracking devices, we propose an approach that predicts the locations of upcoming human actions, such as grasping and releasing objects, with both the perspectives of localization accuracy and anticipation capability. Experimental results demonstrate the effectiveness of this approach, highlighting its potential for efficient and seamless cobotic applications in real-world assembly tasks.},
}In industrial environments, human-robot collaboration (cobotics) has evolved to enhance efficiency without replacing human operators. In this context, real-time prediction of human intent, closely tied to specific physical actions, is essential for cobots to interact with humans in a safe, efficient, and seamless manner. In this paper, we explore the use of eye-tracking technology to infer human intentions during the execution of a simulated assembly task. By utilizing remote or head-mounted eye-tracking devices, we propose an approach that predicts the locations of upcoming human actions, such as grasping and releasing objects, with both the perspectives of localization accuracy and anticipation capability. Experimental results demonstrate the effectiveness of this approach, highlighting its potential for efficient and seamless cobotic applications in real-world assembly tasks. 
- Report Acceptance Assessment of CACs in Educational Context
 P. Dessus, S. Cojean, C. Michel, C. Hanner, R. Laurent, L. Gimeno and Q. Le Bideau
 May 2025
  HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@techreport{dessus:hal-05065379,
  title = {{Acceptance Assessment of CACs in Educational Context}},
  author = {Dessus, Philippe and Cojean, Salom{\'e} and Michel, Christine and Hanner, Carole and Laurent, Romain and Gimeno, Line and Le Bideau, Quentin},
  hal_version = {v1},
  hal_id = {hal-05065379},
  keywords = {Context-aware classroom ; Acceptance assessment},
  month = {May},
  year = {2025},
  institution = {{Univ. Grenoble Alpes (France)}},
  number = {Deliverable A.2},
  url = {https://hal.science/hal-05065379},
  abstract = {This report examines the opinions of the various stakeholders likely involved in a Context- Aware Classroom (CAC) regarding its acceptance and acceptability in an academic environment. This opinion is important because, if unfavorable, it will hinder its use. After reviewing the main models of technology acceptance and acceptability, we focus on CACs to examine a few studies, still rare, that focus on these parameters. We then detail 5 empirical studies on the acceptance of new educational technologies. Study 1 and 2 are about the acceptability of AI-based educational systems and generative AI systems, respectively. The other three studies are mostly devoted to CACs. Study 3 questions the values attached to the future use (acceptability) of CACs. Study 4 is about their acceptance, since participants actually had classes in CACs. Eventually, Study 5 explores the design and prototyping of CACs and learning analytics-based systems to monitor students’ engagement, implementing a UX design method: personas.},
}This report examines the opinions of the various stakeholders likely involved in a Context- Aware Classroom (CAC) regarding its acceptance and acceptability in an academic environment. This opinion is important because, if unfavorable, it will hinder its use. After reviewing the main models of technology acceptance and acceptability, we focus on CACs to examine a few studies, still rare, that focus on these parameters. We then detail 5 empirical studies on the acceptance of new educational technologies. Study 1 and 2 are about the acceptability of AI-based educational systems and generative AI systems, respectively. The other three studies are mostly devoted to CACs. Study 3 questions the values attached to the future use (acceptability) of CACs. Study 4 is about their acceptance, since participants actually had classes in CACs. Eventually, Study 5 explores the design and prototyping of CACs and learning analytics-based systems to monitor students’ engagement, implementing a UX design method: personas. 
- Preprint MuRAL: A Multi-Resident Ambient Sensor Dataset Annotated with Natural Language for Activities of Daily Living
 X. Chen, J. Cumin, F. Ramparany and D. Vaufreydaz
 April 2025
  PDF PDF HAL HAL Link[BibTeX][Abstract] Link[BibTeX][Abstract]@unpublished{chen:hal-05048859,
  title = {{MuRAL: A Multi-Resident Ambient Sensor Dataset Annotated with Natural Language for Activities of Daily Living}},
  author = {Chen, Xi and Cumin, Julien and Ramparany, Fano and Vaufreydaz, Dominique},
  hal_version = {v1},
  hal_id = {hal-05048859},
  pdf = {https://hal.science/hal-05048859v1/file/main.pdf},
  keywords = {Human Activity Recognition ; Dataset ; Large Language Model ; Smart Home ; IoT},
  month = {April},
  year = {2025},
  note = {https://mural.imag.fr/},
  url = {https://hal.science/hal-05048859},
  abstract = {Recent advances in Large Language Models (LLMs) have shown promising potential for human activity recognition (HAR) using ambient sensors, especially through natural language reasoning and zero-shot learning. However, existing datasets such as CASAS, ARAS, and MARBLE were not originally designed with LLMs in mind and therefore lack the contextual richness, complexity, and annotation granularity required to fully exploit LLM capabilities. In this paper, we introduce MuRAL, the first Multi-Resident Ambient sensor dataset with natural Language, comprising over 21 hours of multi-user sensor data collected from 21 sessions in a smart-home environment. MuRAL is annotated with fine-grained natural language descriptions, resident identities, and high-level activity labels, all situated in dynamic, realistic multi-resident settings. We benchmark MuRAL using state-of-the-art LLMs for three core tasks: subject assignment, action description, and activity classification. Our results demonstrate that while LLMs can provide rich semantic interpretations of ambient data, current models still face challenges in handling multi-user ambiguity and under-specified sensor contexts. We release MuRAL to support future research on LLM-powered, explainable, and socially aware activity understanding in smart environments. For access to the dataset, please reach out to us via the provided contact information. A direct link for dataset retrieval will be made available at this location in due course.},
}Recent advances in Large Language Models (LLMs) have shown promising potential for human activity recognition (HAR) using ambient sensors, especially through natural language reasoning and zero-shot learning. However, existing datasets such as CASAS, ARAS, and MARBLE were not originally designed with LLMs in mind and therefore lack the contextual richness, complexity, and annotation granularity required to fully exploit LLM capabilities. In this paper, we introduce MuRAL, the first Multi-Resident Ambient sensor dataset with natural Language, comprising over 21 hours of multi-user sensor data collected from 21 sessions in a smart-home environment. MuRAL is annotated with fine-grained natural language descriptions, resident identities, and high-level activity labels, all situated in dynamic, realistic multi-resident settings. We benchmark MuRAL using state-of-the-art LLMs for three core tasks: subject assignment, action description, and activity classification. Our results demonstrate that while LLMs can provide rich semantic interpretations of ambient data, current models still face challenges in handling multi-user ambiguity and under-specified sensor contexts. We release MuRAL to support future research on LLM-powered, explainable, and socially aware activity understanding in smart environments. For access to the dataset, please reach out to us via the provided contact information. A direct link for dataset retrieval will be made available at this location in due course. 
- Proceedings GAIPAT – Dataset on Human Gaze and Actions for Intent Prediction in Assembly Tasks
 M. Grand, D. Pellier and F. Jambon
 HRI ’25: ACM/IEEE International Conference on Human-Robot Interaction, pp. 1015–1019, Melbourne, Australia, March 2025
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{grand:hal-04987854,
  title = {{GAIPAT - Dataset on Human Gaze and Actions for Intent Prediction in Assembly Tasks}},
  author = {Grand, Maxence and Pellier, Damien and Jambon, Francis},
  booktitle = {{HRI '25: ACM/IEEE International Conference on Human-Robot Interaction}},
  hal_version = {v1},
  hal_id = {hal-04987854},
  pdf = {https://hal.science/hal-04987854v1/file/hgzqi_GAIPAT_Dataset_on_Human_Gaze_and_Actions_for_Intent_Prediction_in_Assembly_Tasks.pdf},
  keywords = {Eye tracking ; Human intentions prediction ; Human robot interaction},
  doi = {10.5555/3721488.3721613},
  month = {March},
  year = {2025},
  pages = {1015--1019},
  publisher = {{Association for Computing Machinery}},
  address = {Melbourne, Australia},
  url = {https://hal.science/hal-04987854},
  abstract = {The primary objective of the dataset is to provide a better understanding of the coupling between human actions and gaze in a shared working environment with a cobot, with the aim of signifcantly enhancing the effciency and safety of humancobot interactions. More broadly, by linking gaze patterns with physical actions, the dataset offers valuable insights into cognitive processes and attention dynamics in the context of assembly tasks. The proposed dataset contains gaze and action data from approximately 80 participants, recorded during simulated industrial assembly tasks. The tasks were simulated using controlled scenarios in which participants manipulated educational building blocks. Gaze data was collected using two different eye-tracking setups -head-mounted and remote-while participants worked in two positions: sitting and standing.},
}The primary objective of the dataset is to provide a better understanding of the coupling between human actions and gaze in a shared working environment with a cobot, with the aim of signifcantly enhancing the effciency and safety of humancobot interactions. More broadly, by linking gaze patterns with physical actions, the dataset offers valuable insights into cognitive processes and attention dynamics in the context of assembly tasks. The proposed dataset contains gaze and action data from approximately 80 participants, recorded during simulated industrial assembly tasks. The tasks were simulated using controlled scenarios in which participants manipulated educational building blocks. Gaze data was collected using two different eye-tracking setups -head-mounted and remote-while participants worked in two positions: sitting and standing. 
- Preprint TaskVAE: Task-Specific Variational Autoencoders for Exemplar Generation in Continual Learning for Human Activity Recognition
 B. Kann, S. Castellanos-Paez, R. Rombourg and P. Lalanda
 2025
  arXiv[BibTeX] arXiv[BibTeX]@unpublished{kann2025taskvaetaskspecificvariationalautoencoders,
  title = {TaskVAE: Task-Specific Variational Autoencoders for Exemplar Generation in Continual Learning for Human Activity Recognition},
  author = {Bonpagna Kann and Sandra Castellanos-Paez and Romain Rombourg and Philippe Lalanda},
  url = {https://arxiv.org/abs/2506.01965},
  primaryclass = {cs.LG},
  archiveprefix = {arXiv},
  eprint = {2506.01965},
  year = {2025},
}
- Journal Comparing Self-Supervised Learning Techniques for Wearable Human Activity Recognition
 S. Ek, R. Presotto, G. Civitarese, F. Portet, P. Lalanda and C. Bettini
 CCF Transactions on Pervasive Computing and Interaction, 2025
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{ek:hal-05014418,
  title = {{Comparing Self-Supervised Learning Techniques for Wearable Human Activity Recognition}},
  author = {Ek, Sannara and Presotto, Riccardo and Civitarese, Gabriele and Portet, Fran{\c c}ois and Lalanda, Philippe and Bettini, Claudio},
  journal = {{CCF Transactions on Pervasive Computing and Interaction}},
  hal_version = {v1},
  hal_id = {hal-05014418},
  pdf = {https://hal.science/hal-05014418v1/file/2404.15331v1.pdf},
  keywords = {Self-Supervised Learning ; Human Activity Recognition},
  doi = {10.1007/s42486-024-00182-9},
  year = {2025},
  publisher = {{Springer}},
  url = {https://hal.science/hal-05014418},
  abstract = {Human Activity Recognition (HAR) based on the sensors of mobile/wearable devices aims to detect the physical activities performed by humans in their daily lives. Although supervised learning methods are the most effective in this task, their effectiveness is constrained to using a large amount of labeled data during training. While collecting raw unlabeled data can be relatively easy, annotating data is challenging due to costs, intrusiveness, and time constraints. To address these challenges, this paper explores alternative approaches for accurate HAR using a limited amount of labeled data. In particular, we have adapted recent Self-Supervised Learning (SSL) algorithms to the HAR domain and compared their effectiveness. We investigate three state-of-the-art SSL techniques of different families: contrastive, generative, and predictive. Additionally, we evaluate the impact of the underlying neural network on the recognition rate by comparing state-of-the-art CNN and transformer architectures. Our results show that a Masked Auto Encoder (MAE) approach significantly outperforms other SSL approaches, including SimCLR, commonly considered one of the best-performing SSL methods in the HAR domain. The code and the pre-trained SSL models are publicly available for further research and development.},
}Human Activity Recognition (HAR) based on the sensors of mobile/wearable devices aims to detect the physical activities performed by humans in their daily lives. Although supervised learning methods are the most effective in this task, their effectiveness is constrained to using a large amount of labeled data during training. While collecting raw unlabeled data can be relatively easy, annotating data is challenging due to costs, intrusiveness, and time constraints. To address these challenges, this paper explores alternative approaches for accurate HAR using a limited amount of labeled data. In particular, we have adapted recent Self-Supervised Learning (SSL) algorithms to the HAR domain and compared their effectiveness. We investigate three state-of-the-art SSL techniques of different families: contrastive, generative, and predictive. Additionally, we evaluate the impact of the underlying neural network on the recognition rate by comparing state-of-the-art CNN and transformer architectures. Our results show that a Masked Auto Encoder (MAE) approach significantly outperforms other SSL approaches, including SimCLR, commonly considered one of the best-performing SSL methods in the HAR domain. The code and the pre-trained SSL models are publicly available for further research and development. 
- Journal Autoregressive GAN for Semantic Unconditional Head Motion Generation
 L. Airale, X. Alameda-Pineda, S. Lathuilière and D. Vaufreydaz
 ACM Transactions on Multimedia Computing, Communications and Applications, vol. 21, no. 1, pp. 14:1-14, December 2024
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{airale:hal-03833759,
  title = {{Autoregressive GAN for Semantic Unconditional Head Motion Generation}},
  author = {Airale, Louis and Alameda-Pineda, Xavier and Lathuili{\`e}re, St{\'e}phane and Vaufreydaz, Dominique},
  journal = {{ACM Transactions on Multimedia Computing, Communications and Applications}},
  hal_version = {v3},
  hal_id = {hal-03833759},
  pdf = {https://inria.hal.science/hal-03833759v3/file/SUHMo.pdf},
  keywords = {GAN ; Head motion ; Face landmarks},
  doi = {10.1145/3635154},
  month = {December},
  year = {2024},
  pages = {14:1-14},
  number = {1},
  volume = {21},
  publisher = {{Association for Computing Machinery}},
  url = {https://inria.hal.science/hal-03833759},
  abstract = {In this work, we address the task of unconditional head motion generation to animate still human faces in a low-dimensional semantic space from a single reference pose. Different from traditional audio-conditioned talking head generation that seldom puts emphasis on realistic head motions, we devise a GAN-based architecture that learns to synthesize rich head motion sequences over long duration while maintaining low error accumulation levels.In particular, the autoregressive generation of incremental outputs ensures smooth trajectories, while a multi-scale discriminator on input pairs drives generation toward better handling of high- and low-frequency signals and less mode collapse.We experimentally demonstrate the relevance of the proposed method and show its superiority compared to models that attained state-of-the-art performances on similar tasks.},
}In this work, we address the task of unconditional head motion generation to animate still human faces in a low-dimensional semantic space from a single reference pose. Different from traditional audio-conditioned talking head generation that seldom puts emphasis on realistic head motions, we devise a GAN-based architecture that learns to synthesize rich head motion sequences over long duration while maintaining low error accumulation levels.In particular, the autoregressive generation of incremental outputs ensures smooth trajectories, while a multi-scale discriminator on input pairs drives generation toward better handling of high- and low-frequency signals and less mode collapse.We experimentally demonstrate the relevance of the proposed method and show its superiority compared to models that attained state-of-the-art performances on similar tasks. 
- Ph.D. Thesis Object Discovery in Images, Videos, and 3D Scenes
 Yangtao Wang
 November 2024
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@phdthesis{wang:tel-05093977,
  title = {{Object Discovery in Images, Videos, and 3D Scenes}},
  author = {Wang, Yangtao},
  hal_version = {v1},
  hal_id = {tel-05093977},
  pdf = {https://theses.hal.science/tel-05093977v1/file/WANG_2024_archivage.pdf},
  type = {Theses},
  keywords = {Object discovery ; Computer Vision ; Transformers ; Multimodal foundational modal ; Deep Learning ; Artificial Intelligence ; D{\'e}couverte d'objects ; Vision par ordinateur ; Transformers ; Mod{\`e}le fondamental multimodal ; L'apprentissage en profondeur ; Intelligence artificielle},
  month = {November},
  year = {2024},
  school = {{Universit{\'e} Grenoble Alpes [2020-....]}},
  number = {2024GRALM062},
  url = {https://theses.hal.science/tel-05093977},
  abstract = {Object Discovery is the task of detecting and segmenting semantically coherent regions of images.Object discovery in images is fundamentally harder than the classic computer vision tasks of object detection or segmentation, due to the possibility of regions that correspond to previously unseen object categories, as well as variations of unseen object appearance due to differences in viewpoint, scale, and lighting conditions. Robust discovery and segmentation of images of previously unseen objects requires extremely general features that can accommodate variations in object appearance, occlusion, and background clutter.Our research began with an investigation of the possibility of using the latent variables from self-supervised Vision Transformers as features for unsupervised object discovery in images.This lead to a simple yet effective algorithm, TokenCut, that is described in Chapter 3 of this thesis. Tokencut has been shown to be effective for unsupervised object discovery, unsupervised saliency detection and weakly supervised object localization tasks using a variety of datasets.Following our success with unsupervised object discovery in images, we have extended TokenCut to unsupervised object detection in video using motion and appearance. The enhanced TokenCut algorithm integrates RGB appearance and optical flow features across video frames, creating a comprehensive graph that allows for the detection and segmentation of moving objects. This extension, described in Chapter 4, demonstrates a unified approach in discovering objects in both static and dynamic scenes, highlighting its robustness and effectiveness of TokenCut algorithm.Encouraged by the success of our work on discovery of object in videos, we turned our attention to the problem of consistent segmentation of 3D objects in 3D scenes using natural language queries. In Chapter 5, we describe a novel approach that integrate 3D Gaussian splatting with pretrained multimodal language models. This method automates the generation and annotation of 3D masks, enabling object segmentation based on textual queries and demonstrating effectiveness on number of relevant datasets.These results provide a additional demonstration of the power of large foundational models for responding to long term hard challenges in Computer Vision and Artificial Intelligence.},
}Object Discovery is the task of detecting and segmenting semantically coherent regions of images.Object discovery in images is fundamentally harder than the classic computer vision tasks of object detection or segmentation, due to the possibility of regions that correspond to previously unseen object categories, as well as variations of unseen object appearance due to differences in viewpoint, scale, and lighting conditions. Robust discovery and segmentation of images of previously unseen objects requires extremely general features that can accommodate variations in object appearance, occlusion, and background clutter.Our research began with an investigation of the possibility of using the latent variables from self-supervised Vision Transformers as features for unsupervised object discovery in images.This lead to a simple yet effective algorithm, TokenCut, that is described in Chapter 3 of this thesis. Tokencut has been shown to be effective for unsupervised object discovery, unsupervised saliency detection and weakly supervised object localization tasks using a variety of datasets.Following our success with unsupervised object discovery in images, we have extended TokenCut to unsupervised object detection in video using motion and appearance. The enhanced TokenCut algorithm integrates RGB appearance and optical flow features across video frames, creating a comprehensive graph that allows for the detection and segmentation of moving objects. This extension, described in Chapter 4, demonstrates a unified approach in discovering objects in both static and dynamic scenes, highlighting its robustness and effectiveness of TokenCut algorithm.Encouraged by the success of our work on discovery of object in videos, we turned our attention to the problem of consistent segmentation of 3D objects in 3D scenes using natural language queries. In Chapter 5, we describe a novel approach that integrate 3D Gaussian splatting with pretrained multimodal language models. This method automates the generation and annotation of 3D masks, enabling object segmentation based on textual queries and demonstrating effectiveness on number of relevant datasets.These results provide a additional demonstration of the power of large foundational models for responding to long term hard challenges in Computer Vision and Artificial Intelligence. 
- Ph.D. Thesis Personalized federated learning for sensor-based human activity recognition in pervasive heterogeneous environments
 Sannara Ek
 November 2024
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@phdthesis{ek:tel-05056717,
  title = {{Personalized federated learning for sensor-based human activity recognition in pervasive heterogeneous environments}},
  author = {Ek, Sannara},
  hal_version = {v1},
  hal_id = {tel-05056717},
  pdf = {https://theses.hal.science/tel-05056717v1/file/EK_2024_archivage.pdf},
  type = {Theses},
  keywords = {Pervasive Computing ; Distributed Architecture ; Federated Learning ; Gestion de ressources ; Informatique pervasive ; Architecture distribu{\'e}e},
  month = {November},
  year = {2024},
  school = {{Universit{\'e} Grenoble Alpes [2020-....]}},
  number = {2024GRALM059},
  url = {https://theses.hal.science/tel-05056717},
  abstract = {Recent advancements in sensor technology and mobile computing have significantly enhanced pervasive computing applications, integrating smart devices into our environments to offer diverse user-oriented services. These services are further augmented by Machine Learning (ML) models, that are increasingly embedded within these devices to leverage their computational power and abundant data. However, adapting ML for a user-centric paradigm— where the priority must be that local models be well personalized and generalized while ensuring data privacy —presents challenges. Federated Learning (FL), a meta-learning client-server framework, provides a promising solution by avoiding the need to communicate user data. Traditionally server-centric, optimizing a single generalized global model, FL leverages local device's computing abilities and their data for training. Although, to fully meet pervasive computing needs, a shift towards a client-centric paradigm is essential.This thesis investigates the application and challenges of client-centric FL in the sensor-based domain of Human Activity Recognition (HAR), which involves predicting physical movements from mobile devices like smartphones and smartwatches. We explore the benefits and limitations of this approach by devising several new evaluation that highlight the detrimental effects of heterogeneity among clients' devices. Additionally, we propose necessary contributions to mitigate these effects, aiming to enhance the overall performance and reliability of client-centric FL in HAR applications.To address the heterogeneity limitation, we present a novel FL aggregation technique that dynamically adjusts the model's architecture to suit the unique traits of individual clients. We then adopt lightweight transformer-based HAR architectures that are robust to changing environments and user habits. Additionally, we develop a novel pre-training pipeline using several public datasets to reduce the data requirements for local fine-tuning. Afterwards, we explores three categories of self-supervised learning techniques to further enhance the robustness of client models by utilizing unlabeled data. Finally, we introduce an embedding-to-prototype matching mechanism via an optimal transport plan to regularize clients within the FL framework, enforcing weight similarity and promoting model consistency.},
}Recent advancements in sensor technology and mobile computing have significantly enhanced pervasive computing applications, integrating smart devices into our environments to offer diverse user-oriented services. These services are further augmented by Machine Learning (ML) models, that are increasingly embedded within these devices to leverage their computational power and abundant data. However, adapting ML for a user-centric paradigm— where the priority must be that local models be well personalized and generalized while ensuring data privacy —presents challenges. Federated Learning (FL), a meta-learning client-server framework, provides a promising solution by avoiding the need to communicate user data. Traditionally server-centric, optimizing a single generalized global model, FL leverages local device's computing abilities and their data for training. Although, to fully meet pervasive computing needs, a shift towards a client-centric paradigm is essential.This thesis investigates the application and challenges of client-centric FL in the sensor-based domain of Human Activity Recognition (HAR), which involves predicting physical movements from mobile devices like smartphones and smartwatches. We explore the benefits and limitations of this approach by devising several new evaluation that highlight the detrimental effects of heterogeneity among clients' devices. Additionally, we propose necessary contributions to mitigate these effects, aiming to enhance the overall performance and reliability of client-centric FL in HAR applications.To address the heterogeneity limitation, we present a novel FL aggregation technique that dynamically adjusts the model's architecture to suit the unique traits of individual clients. We then adopt lightweight transformer-based HAR architectures that are robust to changing environments and user habits. Additionally, we develop a novel pre-training pipeline using several public datasets to reduce the data requirements for local fine-tuning. Afterwards, we explores three categories of self-supervised learning techniques to further enhance the robustness of client models by utilizing unlabeled data. Finally, we introduce an embedding-to-prototype matching mechanism via an optimal transport plan to regularize clients within the FL framework, enforcing weight similarity and promoting model consistency. 
- Ph.D. Thesis Utilisation de l’intelligence artificielle pour le contrôle de convertisseur grid-forming pour micro-réseau
 Hassan Issa
 Université Grenoble Alpes, France, October 2024
 [BibTeX]@phdthesis{issa2024,
  title = {{Utilisation de l'intelligence artificielle pour le contrôle de convertisseur grid-forming pour micro-réseau}},
  author = {Issa, Hassan},
  type = {Génie électrique},
  month = {October},
  year = {2024},
  address = {Universit{\'e} Grenoble Alpes, France},
  school = {{Universit{\'e} Grenoble Alpes}},
}
- Ph.D. Thesis Autonomous Driving in Urban Environments in the presence of Pedestrians using Deep Reinforcement Learning
 Niranjan Deshpande
 Université Grenoble Alpes, France, October 2024
 [BibTeX]@phdthesis{deshpande2024,
  title = {{Autonomous Driving in Urban Environments in the presence of Pedestrians using Deep Reinforcement Learning}},
  author = {Deshpande, Niranjan},
  type = {Computer Science},
  month = {October},
  year = {2024},
  address = {Universit{\'e} Grenoble Alpes, France},
  school = {{Universit{\'e} Grenoble Alpes}},
}
- Proceedings Towards LLM-Powered Ambient Sensor Based Multi-Person Human Activity Recognition
 X. Chen, J. Cumin, F. Ramparany and D. Vaufreydaz
 The 30th International Conference on Parallel and Distributed Systems, Belgrade, Serbia, October 2024
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{chen:hal-04619086,
  title = {{Towards LLM-Powered Ambient Sensor Based Multi-Person Human Activity Recognition}},
  author = {Chen, Xi and Cumin, Julien and Ramparany, Fano and Vaufreydaz, Dominique},
  booktitle = {{The 30th International Conference on Parallel and Distributed Systems}},
  hal_version = {v2},
  hal_id = {hal-04619086},
  pdf = {https://hal.science/hal-04619086v2/file/main.pdf},
  keywords = {Human Activity Recognition ; Large Language Model ; Smart Home ; IoT},
  month = {October},
  year = {2024},
  address = {Belgrade, Serbia},
  url = {https://hal.science/hal-04619086},
  abstract = {Human Activity Recognition (HAR) is one of the central problems in fields such as healthcare, elderly care, and security at home. However, traditional HAR approaches face challenges including data scarcity, difficulties in model generalization, and the complexity of recognizing activities in multi-person scenarios. This paper proposes a system framework called LAHAR, based on large language models. Utilizing prompt engineering techniques, LAHAR addresses HAR in multi-person scenarios by enabling subject separation and action-level descriptions of events occurring in the environment. We validated our approach on the ARAS dataset, and the results demonstrate that LAHAR achieves comparable accuracy to the state-of-the-art method at higher resolutions and maintains robustness in multi-person scenarios.},
}Human Activity Recognition (HAR) is one of the central problems in fields such as healthcare, elderly care, and security at home. However, traditional HAR approaches face challenges including data scarcity, difficulties in model generalization, and the complexity of recognizing activities in multi-person scenarios. This paper proposes a system framework called LAHAR, based on large language models. Utilizing prompt engineering techniques, LAHAR addresses HAR in multi-person scenarios by enabling subject separation and action-level descriptions of events occurring in the environment. We validated our approach on the ARAS dataset, and the results demonstrate that LAHAR achieves comparable accuracy to the state-of-the-art method at higher resolutions and maintains robustness in multi-person scenarios. 
- Proceedings Exploring VQ-VAE with Prosody Parameters for Speaker Anonymization
 S. Leang, A. Augusma, E. Castelli, F. Letué, S. Sam and D. Vaufreydaz
 Voice Privacy Challenge 2024 at INTERSPEECH 2024, Kos Island, Greece, September 2024
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{leang:hal-04706860,
  title = {{Exploring VQ-VAE with Prosody Parameters for Speaker Anonymization}},
  author = {Leang, Sotheara and Augusma, Anderson and Castelli, Eric and Letu{\'e}, Fr{\'e}d{\'e}rique and Sam, Sethserey and Vaufreydaz, Dominique},
  booktitle = {{Voice Privacy Challenge 2024 at INTERSPEECH 2024}},
  hal_version = {v1},
  hal_id = {hal-04706860},
  pdf = {https://inria.hal.science/hal-04706860v1/file/VQVAEForSpeakerAnonymization.pdf},
  keywords = {speech anonymization ; speech synthesis ; vectorquantized variation auto-encoder ; emotional state},
  month = {September},
  year = {2024},
  address = {Kos Island, Greece},
  url = {https://inria.hal.science/hal-04706860},
  abstract = {Human speech conveys prosody, linguistic content, and speaker identity. This article investigates a novel speaker anonymization approach using an end-to-end network based on a Vector-Quantized Variational Auto-Encoder (VQ-VAE) to deal with these speech components. This approach is designed to disentangle these components to specifically target and modify the speaker identity while preserving the linguistic and emotionalcontent. To do so, three separate branches compute embeddings for content, prosody, and speaker identity respectively. During synthesis, taking these embeddings, the decoder of the proposed architecture is conditioned on both speaker and prosody information, allowing for capturing more nuanced emotional states and precise adjustments to speaker identification. Findings indicate that this method outperforms most baseline techniques in preserving emotional information. However, it exhibits more limited performance on other voice privacy tasks, emphasizing the need for further improvements.},
}Human speech conveys prosody, linguistic content, and speaker identity. This article investigates a novel speaker anonymization approach using an end-to-end network based on a Vector-Quantized Variational Auto-Encoder (VQ-VAE) to deal with these speech components. This approach is designed to disentangle these components to specifically target and modify the speaker identity while preserving the linguistic and emotionalcontent. To do so, three separate branches compute embeddings for content, prosody, and speaker identity respectively. During synthesis, taking these embeddings, the decoder of the proposed architecture is conditioned on both speaker and prosody information, allowing for capturing more nuanced emotional states and precise adjustments to speaker identification. Findings indicate that this method outperforms most baseline techniques in preserving emotional information. However, it exhibits more limited performance on other voice privacy tasks, emphasizing the need for further improvements. 
- Proceedings COCOR: Training and Assessing Rotation Invariance in Object and Human (Pose) Detection Tasks
 R. Ly, D. Vaufreydaz, E. Castelli and S. Sam
 The 13th Conference on Information Technology and its Applications (CITA2024), Da Nang City, Vietnam, July 2024
  HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{ly:hal-04706818,
  title = {{COCOR: Training and Assessing Rotation Invariance in Object and Human (Pose) Detection Tasks}},
  author = {Ly, Rottana and Vaufreydaz, Dominique and Castelli, Eric and Sam, Sethserey},
  booktitle = {{The 13th Conference on Information Technology and its Applications (CITA2024)}},
  hal_version = {v1},
  hal_id = {hal-04706818},
  keywords = {Rotation invariance evaluation ; Human detection ; Pose detection ; Object detection},
  month = {July},
  year = {2024},
  address = {Da Nang City, Vietnam},
  url = {https://hal.science/hal-04706818},
  abstract = {The performance of neural networks on human (pose) detection has significantly increased in recent years. However, detecting humans in different poses or positions, with partial occlusions, and at multiple scales remains chal- lenging. The same conclusion arises if we consider object detection tasks. In the context of this research, we focus on the rotation sensitivity in object detection and in human (pose) detection tasks for state-of-the-art neural networks. To the best of our knowledge, there are few corpora dedicated to the rotation problem and, for people detection, to fall or fallen person detection, but none contain all rotation angles of the image that could be used to train or evaluate machine learn- ing systems towards rotation invariance. This research proposes two variants of the COCO dataset. COCOR is a rotated version of the standard COCO dataset for object and human (pose) detections while COCOR-OBB provides oriented bounding boxes information as people annotation. The implementation details concerning the construction of COCOR and COCOR-OBB are depicted in this article. Providing baseline evaluation of SOTA systems, COCOR can be used as a benchmark dataset for rotation invariance evaluation in vision tasks, including object detection and human (pose) estimation.},
}The performance of neural networks on human (pose) detection has significantly increased in recent years. However, detecting humans in different poses or positions, with partial occlusions, and at multiple scales remains chal- lenging. The same conclusion arises if we consider object detection tasks. In the context of this research, we focus on the rotation sensitivity in object detection and in human (pose) detection tasks for state-of-the-art neural networks. To the best of our knowledge, there are few corpora dedicated to the rotation problem and, for people detection, to fall or fallen person detection, but none contain all rotation angles of the image that could be used to train or evaluate machine learn- ing systems towards rotation invariance. This research proposes two variants of the COCO dataset. COCOR is a rotated version of the standard COCO dataset for object and human (pose) detections while COCOR-OBB provides oriented bounding boxes information as people annotation. The implementation details concerning the construction of COCOR and COCOR-OBB are depicted in this article. Providing baseline evaluation of SOTA systems, COCOR can be used as a benchmark dataset for rotation invariance evaluation in vision tasks, including object detection and human (pose) estimation. 
- Preprint Leveraging Task-Specific VAEs for Efficient Exemplar Generation in HAR
 K. Bonpagna, S. Castellanos-Paez, R. Rombourg and P. Lalanda
 May 2024
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@unpublished{bonpagna:hal-04595564,
  title = {{Leveraging Task-Specific VAEs for Efficient Exemplar Generation in HAR}},
  author = {Bonpagna, Kann and Castellanos-Paez, Sandra and Rombourg, Romain and Lalanda, Philippe},
  hal_version = {v1},
  hal_id = {hal-04595564},
  pdf = {https://hal.univ-grenoble-alpes.fr/hal-04595564v1/file/VAE_CL.pdf},
  keywords = {Continual Learning ; HAR ; Replay Methods ; VAE ; Continual Learning HAR Replay Methods VAE ; Continual Learning},
  month = {May},
  year = {2024},
  note = {working paper or preprint},
  url = {https://hal.univ-grenoble-alpes.fr/hal-04595564},
  abstract = {The emerging technologies of smartphones and wearable devices have transformed Human Activity Recognition (HAR), offering a rich source of sensor data for building an automated system to recognize people's daily activities. The sensor-based HAR data also enables Machine Learning (ML) algorithms to classify various activities, indicating a new era of intelligent systems for health monitoring and diagnostics. However, integrating ML into these systems faces the challenge of catastrophic forgetting, where models lose proficiency in previously learned activities when introduced to new ones by users. Continual Learning (CL) has emerged as a solution, enabling models to learn continuously from evolving data streams while reducing forgetting of past knowledge. Within CL methodologies, the use of generative models, such as Variational Autoencoders (VAEs), for example, has drawn significant interest for their capacity to generate synthetic data. This reduces storage demands by creating on-demand samples. However, the application of VAEs with a CL classifier has been limited to low-dimensional data or fine-grained features, leaving a gap in harnessing raw, high-dimensional sensor data for the HAR model. Our research aims to bridge this gap by constructing VAEs with filtering mechanism for direct training with raw sensor data from the HAR dataset, enhancing CL models' capability in class-incremental learning scenario. We demonstrate that VAE with a boundary box sampling and filtering process significantly outperforms both traditional and hybrid exemplar CL methods, offering a more balanced and diverse training set that enhances the knowledge acquisition of the model. Our findings also emphasize the importance of sampling strategies in the latent space of VAEs to maximize data diversity, crucial for recognizing the variability in human activities for better representation of each activity in each CL task.},
}The emerging technologies of smartphones and wearable devices have transformed Human Activity Recognition (HAR), offering a rich source of sensor data for building an automated system to recognize people's daily activities. The sensor-based HAR data also enables Machine Learning (ML) algorithms to classify various activities, indicating a new era of intelligent systems for health monitoring and diagnostics. However, integrating ML into these systems faces the challenge of catastrophic forgetting, where models lose proficiency in previously learned activities when introduced to new ones by users. Continual Learning (CL) has emerged as a solution, enabling models to learn continuously from evolving data streams while reducing forgetting of past knowledge. Within CL methodologies, the use of generative models, such as Variational Autoencoders (VAEs), for example, has drawn significant interest for their capacity to generate synthetic data. This reduces storage demands by creating on-demand samples. However, the application of VAEs with a CL classifier has been limited to low-dimensional data or fine-grained features, leaving a gap in harnessing raw, high-dimensional sensor data for the HAR model. Our research aims to bridge this gap by constructing VAEs with filtering mechanism for direct training with raw sensor data from the HAR dataset, enhancing CL models' capability in class-incremental learning scenario. We demonstrate that VAE with a boundary box sampling and filtering process significantly outperforms both traditional and hybrid exemplar CL methods, offering a more balanced and diverse training set that enhances the knowledge acquisition of the model. Our findings also emphasize the importance of sampling strategies in the latent space of VAEs to maximize data diversity, crucial for recognizing the variability in human activities for better representation of each activity in each CL task. 
- Proceedings Generative Resident Separation and Multi-label Classification for Multi-person Activity Recognition
 X. Chen, J. Cumin, F. Ramparany and D. Vaufreydaz
 Context and Activity Modeling and Recognition (CoMoReA) Workshop at IEEE International Conference on Pervasive Computing and Communications (PerCom 2024), Biarritz, France, March 2024
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{chen:hal-04538267,
  title = {{Generative Resident Separation and Multi-label Classification for Multi-person Activity Recognition}},
  author = {Chen, Xi and Cumin, Julien and Ramparany, Fano and Vaufreydaz, Dominique},
  booktitle = {{Context and Activity Modeling and Recognition (CoMoReA) Workshop at IEEE International Conference on Pervasive Computing and Communications (PerCom 2024)}},
  hal_version = {v1},
  hal_id = {hal-04538267},
  pdf = {https://hal.science/hal-04538267v1/file/main.pdf},
  keywords = {Human Activity Recognition ; Deep learning ; Ambient intelligence ; Smart homes},
  month = {March},
  year = {2024},
  address = {Biarritz, France},
  url = {https://hal.science/hal-04538267},
  abstract = {This paper presents two models to address the problem of multi-person activity recognition using ambient sensors in a home. The first model, Seq2Res, uses a sequence generation approach to separate sensor events from different residents. The second model, BiGRU+Q2L, uses a Query2Label multi-label classifier to predict multiple activities simultaneously. Performances of these models are compared to a state-of-the-art model in different experimental scenarios, using a state-of-the-art dataset of two residents in a home instrumented with ambient sensors. These results lead to a discussion on the advantages and drawbacks of resident separation and multi-label classification for multi-person activity recognition.},
}This paper presents two models to address the problem of multi-person activity recognition using ambient sensors in a home. The first model, Seq2Res, uses a sequence generation approach to separate sensor events from different residents. The second model, BiGRU+Q2L, uses a Query2Label multi-label classifier to predict multiple activities simultaneously. Performances of these models are compared to a state-of-the-art model in different experimental scenarios, using a state-of-the-art dataset of two residents in a home instrumented with ambient sensors. These results lead to a discussion on the advantages and drawbacks of resident separation and multi-label classification for multi-person activity recognition. 
- Proceedings Cross-Dataset Continual Learning: Assessing Pre-Trained Models to Enhance Generalization in HAR
 B. Kann, S. Castellanos-Paez, P. Lalanda and S. Sam
 2024 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), pp. 1-6, Biarritz, France, March 2024
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{kann:hal-04937955,
  title = {{Cross-Dataset Continual Learning: Assessing Pre-Trained Models to Enhance Generalization in HAR}},
  author = {Kann, Bonpagna and Castellanos-Paez, Sandra and Lalanda, Philippe and Sam, Sethserey},
  booktitle = {{2024 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops)}},
  hal_version = {v1},
  hal_id = {hal-04937955},
  pdf = {https://hal.univ-grenoble-alpes.fr/hal-04937955v1/file/Cross_Dataset_Continual_Learning__Leveraging_Pre_Trained_Models_to_Enhance_Generalization_in_HAR.pdf},
  keywords = {Pretrained model ; HAR ; Continual learning},
  doi = {10.1109/PerComWorkshops59983.2024.10502819},
  month = {March},
  year = {2024},
  pages = {1-6},
  publisher = {{IEEE}},
  address = {Biarritz, France},
  url = {https://hal.univ-grenoble-alpes.fr/hal-04937955},
  abstract = {Pervasive computing has profoundly transformed the way in which companies provide and develop innovative services across various sectors. In the healthcare domain, for instance, smartphones equipped with sensors can be used to collect data to enhance health diagnostics and analysis. Using such data in conjunction with Machine Learning (ML) models for Human Activity Recognition (HAR) has gained significant attention, as it offers promising avenues for healthcare innovation and personalized services. However, traditional ML models often struggle to adapt to evolving data streams over time. To address this issue, the introduction of Continual Learning (CL) has become crucial, ensuring that models can accumulate knowledge over time and continually improve their performance in dynamic environments. This, however, raises several major issues related, for example, to catastrophic forgetting as well as to the size of the datasets. Here, the typical size of HAR datasets is relatively small, which can be an issue when conducting training in CL from scratch. To mitigate this challenge, starting the CL process with pre-trained models has emerged as a promising strategy. In this context, the purpose of this paper is twofold. First, we analyze the impact of conducting CL on a target dataset when starting with a pre-trained model initially built with limited data from a similar dataset. Furthermore, we investigate the effect of using a model pre-trained on a large dataset on the CL process conducted on a smaller target dataset. Our experiments on the UCI HAR and the USC HAD datasets showed that CL significantly improves model accuracy when starting with a pre-trained model with limited initial data. However, the choice of the pre-trained model and dataset for CL is crucial. Using a pre-trained model from more complex dataset can lead to better CL accuracy when moving to a simpler dataset.},
}Pervasive computing has profoundly transformed the way in which companies provide and develop innovative services across various sectors. In the healthcare domain, for instance, smartphones equipped with sensors can be used to collect data to enhance health diagnostics and analysis. Using such data in conjunction with Machine Learning (ML) models for Human Activity Recognition (HAR) has gained significant attention, as it offers promising avenues for healthcare innovation and personalized services. However, traditional ML models often struggle to adapt to evolving data streams over time. To address this issue, the introduction of Continual Learning (CL) has become crucial, ensuring that models can accumulate knowledge over time and continually improve their performance in dynamic environments. This, however, raises several major issues related, for example, to catastrophic forgetting as well as to the size of the datasets. Here, the typical size of HAR datasets is relatively small, which can be an issue when conducting training in CL from scratch. To mitigate this challenge, starting the CL process with pre-trained models has emerged as a promising strategy. In this context, the purpose of this paper is twofold. First, we analyze the impact of conducting CL on a target dataset when starting with a pre-trained model initially built with limited data from a similar dataset. Furthermore, we investigate the effect of using a model pre-trained on a large dataset on the CL process conducted on a smaller target dataset. Our experiments on the UCI HAR and the USC HAD datasets showed that CL significantly improves model accuracy when starting with a pre-trained model with limited initial data. However, the choice of the pre-trained model and dataset for CL is crucial. Using a pre-trained model from more complex dataset can lead to better CL accuracy when moving to a simpler dataset. 
- Ph.D. Thesis Adversarial learning methods for the generation of human interaction data
 Louis Airale
 December 2023
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@phdthesis{airale:tel-04612415,
  title = {{Adversarial learning methods for the generation of human interaction data}},
  author = {Airale, Louis},
  hal_version = {v1},
  hal_id = {tel-04612415},
  pdf = {https://theses.hal.science/tel-04612415v1/file/AIRALE_2023_archivage.pdf},
  type = {Theses},
  keywords = {Generative models ; Adversarial learning ; Human interaction data ; Mod{\`e}les g{\'e}n{\'e}ratifs ; Apprentissage adverse ; Interaction humaine},
  month = {December},
  year = {2023},
  school = {{Universit{\'e} Grenoble Alpes [2020-....]}},
  number = {2023GRALM072},
  url = {https://theses.hal.science/tel-04612415},
  abstract = {The objective of this thesis work is to explore new deep generative model architectures for diverse human interaction data generation tasks.The applications for such systems are various: social robotics, animation or entertainment, but always pertain to building more natural interactive systems between humans and machines.Owing to their astonishing performances on a wide range of applications, deep generative models offer an ideal framework to address this task.In return, one can learn how to improve the training of such models by adjusting them to tackle the challenges and constraints posed by human interaction data generation.In this thesis, we consider three generation tasks, corresponding to as many target modalities or conditioning signals.Interactions are first modeled as sequences of discrete, high-level actions simultaneously achieved by a free number of participants.Then, we consider the continuous facial dynamics of a conversing individual and attempt to produce realistic animations from a single reference frame in the facial landmark domain.Finally, we address the task of co-speech talking face generation, where the aim is to correlate the output head and lips motion with an input speech signal.Interestingly, similar deep generative models based on autoregressive adversarial networks provide state-of-the-art results on these otherwise slightly related tasks.% While generative adversarial networks (GAN) improve the performances of maximum likelihood estimation methods, especially for the generation of continuous sequences, autoregressive models allow to produce sequences of arbitrary length.Training such models can however be long or unstable, in particular when the conditioning signal is weak (e.g. when only an initial state is provided).In light of this, we first devise an autoregressive generative adversarial network (GAN) for the generation of discrete interaction sequences, where we introduce a window-based discriminator network that accelerates the training and improves the output quality.We then scale this approach to the generation of continuous facial landmark coordinates, and exploit the inductive bias of autoregressive models for cumulative sums via residual predictions.In this unconditional setting, jointly generating and discriminating pairs of samples proved essential to allow long-term consistency and reduce mode collapse.In the third and last chapter, we introduce a multi-scale loss function and a multi-scale generator network to allow our autoregressive GAN to produce, for the first time, speech-correlated head and lips motion over multiple timescales.Experiments conducted on benchmark datasets featuring multiple interaction data modalities illustrate the efficiency of the proposed methods.},
}The objective of this thesis work is to explore new deep generative model architectures for diverse human interaction data generation tasks.The applications for such systems are various: social robotics, animation or entertainment, but always pertain to building more natural interactive systems between humans and machines.Owing to their astonishing performances on a wide range of applications, deep generative models offer an ideal framework to address this task.In return, one can learn how to improve the training of such models by adjusting them to tackle the challenges and constraints posed by human interaction data generation.In this thesis, we consider three generation tasks, corresponding to as many target modalities or conditioning signals.Interactions are first modeled as sequences of discrete, high-level actions simultaneously achieved by a free number of participants.Then, we consider the continuous facial dynamics of a conversing individual and attempt to produce realistic animations from a single reference frame in the facial landmark domain.Finally, we address the task of co-speech talking face generation, where the aim is to correlate the output head and lips motion with an input speech signal.Interestingly, similar deep generative models based on autoregressive adversarial networks provide state-of-the-art results on these otherwise slightly related tasks.% While generative adversarial networks (GAN) improve the performances of maximum likelihood estimation methods, especially for the generation of continuous sequences, autoregressive models allow to produce sequences of arbitrary length.Training such models can however be long or unstable, in particular when the conditioning signal is weak (e.g. when only an initial state is provided).In light of this, we first devise an autoregressive generative adversarial network (GAN) for the generation of discrete interaction sequences, where we introduce a window-based discriminator network that accelerates the training and improves the output quality.We then scale this approach to the generation of continuous facial landmark coordinates, and exploit the inductive bias of autoregressive models for cumulative sums via residual predictions.In this unconditional setting, jointly generating and discriminating pairs of samples proved essential to allow long-term consistency and reduce mode collapse.In the third and last chapter, we introduce a multi-scale loss function and a multi-scale generator network to allow our autoregressive GAN to produce, for the first time, speech-correlated head and lips motion over multiple timescales.Experiments conducted on benchmark datasets featuring multiple interaction data modalities illustrate the efficiency of the proposed methods. 
- Journal Transformer-based models to deal with heterogeneous environments in Human Activity Recognition
 S. Ek, F. Portet and P. Lalanda
 Personal and Ubiquitous Computing, November 2023
  DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{ek:hal-04277025,
  title = {{Transformer-based models to deal with heterogeneous environments in Human Activity Recognition}},
  author = {Ek, Sannara and Portet, Fran{\c c}ois and Lalanda, Philippe},
  journal = {{Personal and Ubiquitous Computing}},
  hal_version = {v1},
  hal_id = {hal-04277025},
  doi = {10.1007/s00779-023-01776-3},
  month = {November},
  year = {2023},
  publisher = {{Springer Verlag}},
  url = {https://hal.science/hal-04277025},
  abstract = {Human Activity Recognition (HAR) on mobile devices has shown to be achievable with lightweight neural models learned from data generated by the user's inertial measurement units (IMUs). Most approaches for instanced-based HAR have used Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTMs), or a combination of the two to achieve state-of-the-art results with real-time performances. Recently, the Transformers architecture in the language processing domain and then in the vision domain has pushed further the state-of-the-art over classical architectures. However, such Transformers architecture is heavyweight in computing resources, which is not well suited for embedded applications of HAR that can be found in the pervasive computing domain. In this study, we present Human Activity Recognition Transformer (HART), a lightweight, sensor-wise transformer architecture that has been specifically adapted to the domain of the IMUs embedded on mobile devices. Our experiments on HAR tasks with several publicly available datasets show that HART uses fewer FLoating-point Operations Per Second (FLOPS) and parameters while outperforming current state-of-the-art results. Furthermore, we present evaluations across various architectures on their performances in heterogeneous environments and show that our models can better generalize on different sensing devices or on-body positions.},
}Human Activity Recognition (HAR) on mobile devices has shown to be achievable with lightweight neural models learned from data generated by the user's inertial measurement units (IMUs). Most approaches for instanced-based HAR have used Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTMs), or a combination of the two to achieve state-of-the-art results with real-time performances. Recently, the Transformers architecture in the language processing domain and then in the vision domain has pushed further the state-of-the-art over classical architectures. However, such Transformers architecture is heavyweight in computing resources, which is not well suited for embedded applications of HAR that can be found in the pervasive computing domain. In this study, we present Human Activity Recognition Transformer (HART), a lightweight, sensor-wise transformer architecture that has been specifically adapted to the domain of the IMUs embedded on mobile devices. Our experiments on HAR tasks with several publicly available datasets show that HART uses fewer FLoating-point Operations Per Second (FLOPS) and parameters while outperforming current state-of-the-art results. Furthermore, we present evaluations across various architectures on their performances in heterogeneous environments and show that our models can better generalize on different sensing devices or on-body positions. 
- Ph.D. Thesis Towards rotation invariance for neural networks: Application to human (pose) detection
 Rottana Ly
 November 2023
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@phdthesis{ly:tel-04854310,
  title = {{Towards rotation invariance for neural networks: Application to human (pose) detection}},
  author = {Ly, Rottana},
  hal_version = {v1},
  hal_id = {tel-04854310},
  pdf = {https://hal.science/tel-04854310v1/file/116150_LY_2023_archivage.pdf},
  type = {Theses},
  keywords = {Neural networks ; human detection ; pose detection ; rotation invariance ; r{\'e}seaux de neurones ; d{\'e}tection de personnes ; d{\'e}tection de pose ; invariance {\`a} la rotation},
  month = {November},
  year = {2023},
  school = {{Universit{\'e} Grenoble Alpes (UGA)}},
  url = {https://hal.science/tel-04854310},
  abstract = {Current neural networks achieved significant performance on many vision tasks, including object detection and human pose estimation. However, detecting objects in different orientations, with partial occlusions, and at multiple scales is still challenging. This thesis addresses the unsolved rotation invariance problem in current neural networks. In real-life applications, versus the available training data, objects or people must be detected at any orientation within the image. For instance, when a companion robot is searching for an elderly/fragile person who may have fallen: the robot needs to be able to detect the person whatever their orientation within the image, thus a rotation-invariant neural network is of interest in this case. To solve the problem of rotation invariance, several approaches have already been proposed in the literature in other contexts. Parameter correction in neural networks and local person orientation approaches do not solve the rotation invariance problem. Our proposal for global correction of image orientation offers better results at different orientations, but data preprocessing is required. Multi-orientation feature computation approaches give good results for image classification but are not successfully applied to tasks such as object or person detection. To take account of the object's spatial representation, we propose the "Spatial Wise Rotation Invariant Transformer (SWRIT)", which calculates rotation-invariant features through orientation-sensitive attention while preserving their spatial organization. SWRIT can be integrated into any convolutional or transformer-based neural architecture, framing the backbone feature extraction network and enabling the learning of less rotation-sensitive features. Based on experiments with COCOR, a rotation invariant evaluation dataset we are offering to the community, we show that SWRIT improves the performance of current neural networks regarding rotation robustness.},
}Current neural networks achieved significant performance on many vision tasks, including object detection and human pose estimation. However, detecting objects in different orientations, with partial occlusions, and at multiple scales is still challenging. This thesis addresses the unsolved rotation invariance problem in current neural networks. In real-life applications, versus the available training data, objects or people must be detected at any orientation within the image. For instance, when a companion robot is searching for an elderly/fragile person who may have fallen: the robot needs to be able to detect the person whatever their orientation within the image, thus a rotation-invariant neural network is of interest in this case. To solve the problem of rotation invariance, several approaches have already been proposed in the literature in other contexts. Parameter correction in neural networks and local person orientation approaches do not solve the rotation invariance problem. Our proposal for global correction of image orientation offers better results at different orientations, but data preprocessing is required. Multi-orientation feature computation approaches give good results for image classification but are not successfully applied to tasks such as object or person detection. To take account of the object's spatial representation, we propose the "Spatial Wise Rotation Invariant Transformer (SWRIT)", which calculates rotation-invariant features through orientation-sensitive attention while preserving their spatial organization. SWRIT can be integrated into any convolutional or transformer-based neural architecture, framing the backbone feature extraction network and enabling the learning of less rotation-sensitive features. Based on experiments with COCOR, a rotation invariant evaluation dataset we are offering to the community, we show that SWRIT improves the performance of current neural networks regarding rotation robustness. 
- Ph.D. Thesis Multimodal transformers for emotion recognition
 Juan Fernando Vazquez Rodriguez
 November 2023
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@phdthesis{vazquezrodriguez:tel-04542869,
  title = {{Multimodal transformers for emotion recognition}},
  author = {Vazquez Rodriguez, Juan Fernando},
  hal_version = {v1},
  hal_id = {tel-04542869},
  pdf = {https://theses.hal.science/tel-04542869v1/file/VAZQUEZ_RODRIGUEZ_2023_archivage.pdf},
  type = {Theses},
  keywords = {Artificial Intelligence ; Healthy Aging ; Emotion Recognition ; Multimodal Transformers ; Affective Computing ; Machine Learning ; Reconnaissance des Emotions ; Transformers Multimodaux ; Informatique Affective ; Apprentissage Automatique ; Intelligence Artificielle ; Vieillissement en Bonne Sant{\'e}},
  month = {November},
  year = {2023},
  school = {{Universit{\'e} Grenoble Alpes [2020-....]}},
  number = {2023GRALM057},
  url = {https://theses.hal.science/tel-04542869},
  abstract = {Mental health and emotional well-being have significant influence on physical health, and are especially important for healthy aging. Continued progress on sensors and microelectronics has provided a number of new technologies that can be deployed in homes and used to monitor health and well-being. These can be combined with recent advances in machine learning to provide services that enhance the physical and emotional well-being of individuals to promote healthy aging. In this context, an automatic emotion recognition system can provide a tool to help assure the emotional well-being of frail people. Therefore, it is desirable to develop a technology that can draw information about human emotions from multiple sensor modalities and can be trained without the need for large labeled training datasets.This thesis addresses the problem of emotion recognition using the different types of signals that a smart environment may provide, such as visual, audio, and physiological signals. To do this, we develop different models based on the Transformer architecture, which has useful characteristics such as their capacity to model long-range dependencies, as well as their capability to discern the relevant parts of the input. We first propose a model to recognize emotions from individual physiological signals. We propose a self-supervised pre-training technique that uses unlabeled physiological signals, showing that that pre-training technique helps the model to perform better. This approach is then extended to take advantage of the complementarity of information that may exist in different physiological signals. For this, we develop a model that combines different physiological signals and also uses self-supervised pre-training to improve its performance. We propose a method for pre-training that does not require a dataset with the complete set of target signals, but can rather, be trained on individual datasets from each target signal.To further take advantage of the different modalities that a smart environment may provide, we also propose a model that uses as inputs multimodal signals such as video, audio, and physiological signals. Since these signals are of a different nature, they cover different ways in which emotions are expressed, thus they should provide complementary information concerning emotions, and therefore it is appealing to use them together. However, in real-world scenarios, there might be cases where a modality is missing. Our model is flexible enough to continue working when a modality is missing, albeit with a reduction in its performance. To address this problem, we propose a training strategy that reduces the drop in performance when a modality is missing.The methods developed in this thesis are evaluated using several datasets, obtaining results that demonstrate the effectiveness of our approach to pre-train Transformers to recognize emotions from physiological signals. The results also show the efficacy of our Transformer-based solution to aggregate multimodal information, and to accommodate missing modalities. These results demonstrate the feasibility of the proposed approaches to recognizing emotions from multiple environmental sensors. This opens new avenues for deeper exploration of using Transformer-based approaches to process information from environmental sensors and allows the development of emotion recognition technologies robust to missing modalities. The results of this work can contribute to better care for the mental health of frail people.},
}Mental health and emotional well-being have significant influence on physical health, and are especially important for healthy aging. Continued progress on sensors and microelectronics has provided a number of new technologies that can be deployed in homes and used to monitor health and well-being. These can be combined with recent advances in machine learning to provide services that enhance the physical and emotional well-being of individuals to promote healthy aging. In this context, an automatic emotion recognition system can provide a tool to help assure the emotional well-being of frail people. Therefore, it is desirable to develop a technology that can draw information about human emotions from multiple sensor modalities and can be trained without the need for large labeled training datasets.This thesis addresses the problem of emotion recognition using the different types of signals that a smart environment may provide, such as visual, audio, and physiological signals. To do this, we develop different models based on the Transformer architecture, which has useful characteristics such as their capacity to model long-range dependencies, as well as their capability to discern the relevant parts of the input. We first propose a model to recognize emotions from individual physiological signals. We propose a self-supervised pre-training technique that uses unlabeled physiological signals, showing that that pre-training technique helps the model to perform better. This approach is then extended to take advantage of the complementarity of information that may exist in different physiological signals. For this, we develop a model that combines different physiological signals and also uses self-supervised pre-training to improve its performance. We propose a method for pre-training that does not require a dataset with the complete set of target signals, but can rather, be trained on individual datasets from each target signal.To further take advantage of the different modalities that a smart environment may provide, we also propose a model that uses as inputs multimodal signals such as video, audio, and physiological signals. Since these signals are of a different nature, they cover different ways in which emotions are expressed, thus they should provide complementary information concerning emotions, and therefore it is appealing to use them together. However, in real-world scenarios, there might be cases where a modality is missing. Our model is flexible enough to continue working when a modality is missing, albeit with a reduction in its performance. To address this problem, we propose a training strategy that reduces the drop in performance when a modality is missing.The methods developed in this thesis are evaluated using several datasets, obtaining results that demonstrate the effectiveness of our approach to pre-train Transformers to recognize emotions from physiological signals. The results also show the efficacy of our Transformer-based solution to aggregate multimodal information, and to accommodate missing modalities. These results demonstrate the feasibility of the proposed approaches to recognizing emotions from multiple environmental sensors. This opens new avenues for deeper exploration of using Transformer-based approaches to process information from environmental sensors and allows the development of emotion recognition technologies robust to missing modalities. The results of this work can contribute to better care for the mental health of frail people. 
- Preprint A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation
 L. Airale, D. Vaufreydaz and X. Alameda-Pineda
 July 2023
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@unpublished{airale:hal-04149083,
  title = {{A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation}},
  author = {Airale, Louis and Vaufreydaz, Dominique and Alameda-Pineda, Xavier},
  hal_version = {v2},
  hal_id = {hal-04149083},
  pdf = {https://hal.science/hal-04149083v2/file/main.pdf},
  keywords = {Talking Head ; Generative adversarial network ; Speech representation ; Computer vision},
  month = {July},
  year = {2023},
  note = {working paper or preprint},
  url = {https://hal.science/hal-04149083},
  abstract = {Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress. However, much of the effort has been put into lip syncing and rendering quality while the generation of natural head motion, let alone the audio-visual correlation between head motion and speech, has often been neglected. In this work, we propose a multi-scale audio-visual synchrony loss and a multi-scale autoregressive GAN to better handle short and long-term correlation between speech and the dynamics of the head and lips. In particular, we train a stack of syncer models on multimodal input pyramids and use these models as guidance in a multi-scale generator network to produce audio-aligned motion unfolding over diverse time scales. Both the pyramid of audio-visual syncers and the generative models are trained in a low-dimensional space that fully preserves dynamics cues. The experiments show significant improvements over the state-of-the-art in head motion dynamics quality and especially in multi-scale audio-visual synchrony on a collection of benchmark datasets.},
}Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress. However, much of the effort has been put into lip syncing and rendering quality while the generation of natural head motion, let alone the audio-visual correlation between head motion and speech, has often been neglected. In this work, we propose a multi-scale audio-visual synchrony loss and a multi-scale autoregressive GAN to better handle short and long-term correlation between speech and the dynamics of the head and lips. In particular, we train a stack of syncer models on multimodal input pyramids and use these models as guidance in a multi-scale generator network to produce audio-aligned motion unfolding over diverse time scales. Both the pyramid of audio-visual syncers and the generative models are trained in a low-dimensional space that fully preserves dynamics cues. The experiments show significant improvements over the state-of-the-art in head motion dynamics quality and especially in multi-scale audio-visual synchrony on a collection of benchmark datasets. 
- Proceedings Multimodal Group Emotion Recognition In-the-wild Using Privacy-Compliant Features
 A. Augusma, D. Vaufreydaz and F. Letué
 ICMI ’23: International Conference on Multimodal Interaction, pp. 750-754, Paris, France, October 2023
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{augusma:hal-04325815,
  title = {{Multimodal Group Emotion Recognition In-the-wild Using Privacy-Compliant Features}},
  author = {Augusma, Anderson and Vaufreydaz, Dominique and Letu{\'e}, Fr{\'e}d{\'e}rique},
  booktitle = {{ICMI '23: International Conference on Multimodal Interaction}},
  hal_version = {v1},
  hal_id = {hal-04325815},
  pdf = {https://hal.science/hal-04325815v1/file/MgEmoR-pcf-Emotiw2023.pdf},
  keywords = {Transformer networks ; Group emotion recognition in-the-wild ; Multimodal ; Privacy safe},
  doi = {10.1145/3577190.3616546},
  month = {October},
  year = {2023},
  pages = {750-754},
  publisher = {{ACM}},
  address = {Paris, France},
  url = {https://hal.science/hal-04325815},
  abstract = {This paper explores privacy-compliant group-level emotion recognition "in-the-wild" within the EmotiW Challenge 2023. Group-level emotion recognition can be useful in many fields including social robotics, conversational agents, e-coaching and learning analytics. This research imposes itself using only global features avoiding individual ones, i.e. all features that can be used to identify or track people in videos (facial landmarks, body poses, audio diarization, etc.). The proposed multimodal model is composed of a video and an audio branches with a cross-attention between modalities. The video branch is based on a fine-tuned ViT architecture. The audio branch extracts Mel-spectrograms and feed them through CNN blocks into a transformer encoder. Our training paradigm includes a generated synthetic dataset to increase the sensitivity of our model on facial expression within the image in a data-driven way. The extensive experiments show the significance of our methodology. Our privacy-compliant proposal performs fairly on the EmotiW challenge, with 79.24% and 75.13% of accuracy respectively on validation and test set for the best models. Noticeably, our findings highlight that it is possible to reach this accuracy level with privacy-compliant features using only 5 frames uniformly distributed on the video.},
}This paper explores privacy-compliant group-level emotion recognition "in-the-wild" within the EmotiW Challenge 2023. Group-level emotion recognition can be useful in many fields including social robotics, conversational agents, e-coaching and learning analytics. This research imposes itself using only global features avoiding individual ones, i.e. all features that can be used to identify or track people in videos (facial landmarks, body poses, audio diarization, etc.). The proposed multimodal model is composed of a video and an audio branches with a cross-attention between modalities. The video branch is based on a fine-tuned ViT architecture. The audio branch extracts Mel-spectrograms and feed them through CNN blocks into a transformer encoder. Our training paradigm includes a generated synthetic dataset to increase the sensitivity of our model on facial expression within the image in a data-driven way. The extensive experiments show the significance of our methodology. Our privacy-compliant proposal performs fairly on the EmotiW challenge, with 79.24% and 75.13% of accuracy respectively on validation and test set for the best models. Noticeably, our findings highlight that it is possible to reach this accuracy level with privacy-compliant features using only 5 frames uniformly distributed on the video. 
- proceedings ICMI ’23: Proceedings of the 25th International Conference on Multimodal Interaction
 E. André, M. Chetouani, D. Vaufreydaz, G. Lucas, T. Schultz, L. Morency and A. Vinciarelli
 October 2023
  DOI DOI HAL[BibTeX] HAL[BibTeX]@proceedings{andre:hal-04317088,
  title = {{ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction}},
  author = {Andr{\'e}, Elisabeth and Chetouani, Mohamed and Vaufreydaz, Dominique and Lucas, Gale and Schultz, Tanja and Morency, Louis-Philippe and Vinciarelli, Alessandro},
  hal_version = {v1},
  hal_id = {hal-04317088},
  doi = {10.1145/3577190},
  month = {October},
  year = {2023},
  publisher = {{ACM}},
  url = {https://hal.science/hal-04317088},
  abstract = {},
}
- Proceedings Accommodating Missing Modalities in Time-Continuous Multimodal Emotion Recognition
 J. Vazquez-Rodriguez, G. Lefebvre, J. Cumin and J. L. Crowley
 Affective Computing and Intelligent Interaction (ACII), Cambridge (MA), United States, September 2023
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{vazquezrodriguez:hal-04287765,
  title = {{Accommodating Missing Modalities in Time-Continuous Multimodal Emotion Recognition}},
  author = {Vazquez-Rodriguez, Juan and Lefebvre, Gr{\'e}goire and Cumin, Julien and Crowley, James L.},
  booktitle = {{Affective Computing and Intelligent Interaction (ACII)}},
  hal_version = {v1},
  hal_id = {hal-04287765},
  pdf = {https://hal.science/hal-04287765v1/file/EmoReconMissMod.pdf},
  keywords = {Multimodal Emotion Recognition ; Machine Learning ; Transformers ; Affective Computing},
  month = {September},
  year = {2023},
  address = {Cambridge (MA), United States},
  url = {https://hal.science/hal-04287765},
  abstract = {Decades of research indicate that emotion recognition is more effective when drawing information from multiple modalities. But what if some modalities are sometimes missing? To address this problem, we propose a novel Transformer-based architecture for recognizing valence and arousal in a time-continuous manner even with missing input modalities. We use a coupling of cross-attention and self-attention mechanisms to emphasize relationships between modalities during time and enhance the learning process on weak salient inputs. Experimental results on the Ulm-TSST dataset show that our model exhibits an improvement of the concordance correlation coefficient evaluation of 37% when predicting arousal values and 30% when predicting valence values, compared to a late-fusion baseline approach.},
}Decades of research indicate that emotion recognition is more effective when drawing information from multiple modalities. But what if some modalities are sometimes missing? To address this problem, we propose a novel Transformer-based architecture for recognizing valence and arousal in a time-continuous manner even with missing input modalities. We use a coupling of cross-attention and self-attention mechanisms to emphasize relationships between modalities during time and enhance the learning process on weak salient inputs. Experimental results on the Ulm-TSST dataset show that our model exhibits an improvement of the concordance correlation coefficient evaluation of 37% when predicting arousal values and 30% when predicting valence values, compared to a late-fusion baseline approach. 
- Journal TokenCut: Segmenting Objects in Images and Videos with Self-supervised Transformer and Normalized Cut
 Y. Wang, X. Shen, Y. Yuan, Y. Du, M. Li, S. X. Hu, J. L. Crowley and D. Vaufreydaz
 IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 12, pp. 15790 – 15801, December 2023
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{wang:hal-03765422,
  title = {{TokenCut: Segmenting Objects in Images and Videos with Self-supervised Transformer and Normalized Cut}},
  author = {Wang, Yangtao and Shen, Xi and Yuan, Yuan and Du, Yuming and Li, Maomao and Hu, Shell Xu and Crowley, James L and Vaufreydaz, Dominique},
  journal = {{IEEE Transactions on Pattern Analysis and Machine Intelligence}},
  hal_version = {v3},
  hal_id = {hal-03765422},
  pdf = {https://hal.science/hal-03765422v3/file/TokenCutVideo.pdf},
  doi = {10.1109/TPAMI.2023.3305122},
  month = {December},
  year = {2023},
  pages = {15790 - 15801},
  number = {12},
  volume = {45},
  publisher = {{Institute of Electrical and Electronics Engineers}},
  url = {https://hal.science/hal-03765422},
  abstract = {In this paper, we describe a graph-based algorithm that uses the features obtained by a self-supervised transformer to detect and segment salient objects in images and videos. With this approach, the image patches that compose an image or video are organised into a fully connected graph, where the edge between each pair of patches is labeled with a similarity score between patches using features learned by the transformer. Detection and segmentation of salient objects is then formulated as a graph-cut problem and solved using the classical Normalized Cut algorithm. Despite the simplicity of this approach, it achieves state-of-the-art results on several common image and video detection and segmentation tasks. For unsupervised object discovery, this approach outperforms the competing approaches by a margin of 6.1%, 5.7%, and 2.6%, respectively, when tested with the VOC07, VOC12, and COCO20K datasets. For the unsupervised saliency detection task in images, this method improves the score for Intersection over Union (IoU) by 4.4%, 5.6% and 5.2%. When tested with the ECSSD, DUTS, and DUT-OMRON datasets, respectively, compared to current state-of-the-art techniques. This method also achieves competitive results for unsupervised video object segmentation tasks with the DAVIS, SegTV2, and FBMS datasets.},
}In this paper, we describe a graph-based algorithm that uses the features obtained by a self-supervised transformer to detect and segment salient objects in images and videos. With this approach, the image patches that compose an image or video are organised into a fully connected graph, where the edge between each pair of patches is labeled with a similarity score between patches using features learned by the transformer. Detection and segmentation of salient objects is then formulated as a graph-cut problem and solved using the classical Normalized Cut algorithm. Despite the simplicity of this approach, it achieves state-of-the-art results on several common image and video detection and segmentation tasks. For unsupervised object discovery, this approach outperforms the competing approaches by a margin of 6.1%, 5.7%, and 2.6%, respectively, when tested with the VOC07, VOC12, and COCO20K datasets. For the unsupervised saliency detection task in images, this method improves the score for Intersection over Union (IoU) by 4.4%, 5.6% and 5.2%. When tested with the ECSSD, DUTS, and DUT-OMRON datasets, respectively, compared to current state-of-the-art techniques. This method also achieves competitive results for unsupervised video object segmentation tasks with the DAVIS, SegTV2, and FBMS datasets. 
- Proceedings Combining Public Human Activity Recognition Datasets to Mitigate Labeled Data Scarcity
 R. Presotto, S. Ek, G. Civitarese, F. Portet, P. Lalanda and C. Bettini
 2023 IEEE International Conference on Smart Computing (SMARTCOMP), pp. 33-40, Nashville, United States, June 2023
  DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{presotto:hal-04275000,
  title = {{Combining Public Human Activity Recognition Datasets to Mitigate Labeled Data Scarcity}},
  author = {Presotto, Riccardo and Ek, Sannara and Civitarese, Gabriele and Portet, Fran{\c c}ois and Lalanda, Philippe and Bettini, Claudio},
  booktitle = {{2023 IEEE International Conference on Smart Computing (SMARTCOMP)}},
  hal_version = {v1},
  hal_id = {hal-04275000},
  doi = {10.1109/SMARTCOMP58114.2023.00022},
  month = {June},
  year = {2023},
  pages = {33-40},
  publisher = {{IEEE}},
  address = {Nashville, United States},
  note = {IEEE SMARTCOMP 2023},
  url = {https://hal.science/hal-04275000},
  abstract = {The use of supervised learning for Human Activity Recognition (HAR) on mobile devices leads to strong classification performances. Such an approach, however, requires large amounts of labeled data, both for the initial training of the models and for their customization on specific clients (whose data often differ greatly from the training data). This is actually impractical to obtain due to the costs, intrusiveness, and time-consuming nature of data annotation. Moreover, even with the help of a significant amount of labeled data, model deployment on heterogeneous clients faces difficulties in generalizing well on unseen data. Other domains, like Computer Vision or Natural Language Processing, have proposed the notion of pre-trained models, leveraging large corpora, to reduce the need for annotated data and better manage heterogeneity. This promising approach has not been implemented in the HAR domain so far because of the lack of public datasets of sufficient size. In this paper, we propose a novel strategy to combine publicly available datasets with the goal of learning a generalized HAR model that can be fine-tuned using a limited amount of labeled data on an unseen target domain. Our experimental evaluation, which includes experimenting with different state-of-the-art neural network architectures, shows that combining public datasets can significantly reduce the number of labeled samples required to achieve satisfactory performance on an unseen target domain.},
}The use of supervised learning for Human Activity Recognition (HAR) on mobile devices leads to strong classification performances. Such an approach, however, requires large amounts of labeled data, both for the initial training of the models and for their customization on specific clients (whose data often differ greatly from the training data). This is actually impractical to obtain due to the costs, intrusiveness, and time-consuming nature of data annotation. Moreover, even with the help of a significant amount of labeled data, model deployment on heterogeneous clients faces difficulties in generalizing well on unseen data. Other domains, like Computer Vision or Natural Language Processing, have proposed the notion of pre-trained models, leveraging large corpora, to reduce the need for annotated data and better manage heterogeneity. This promising approach has not been implemented in the HAR domain so far because of the lack of public datasets of sufficient size. In this paper, we propose a novel strategy to combine publicly available datasets with the goal of learning a generalized HAR model that can be fine-tuned using a limited amount of labeled data on an unseen target domain. Our experimental evaluation, which includes experimenting with different state-of-the-art neural network architectures, shows that combining public datasets can significantly reduce the number of labeled samples required to achieve satisfactory performance on an unseen target domain. 
- Proceedings AI-based controller for grid-forming inverter-based generators under extreme dynamics
 H. Issa, V. Debusschere, L. Garbuio, P. Lalanda and N. Hadjsaid
 27th International Conference on Electricity Distribution (CIRED 2023), pp. 1305-1309, Rome, Italy, June 2023
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{issa:hal-04479257,
  title = {{AI-based controller for grid-forming inverter-based generators under extreme dynamics}},
  author = {Issa, H. and Debusschere, Vincent and Garbuio, L. and Lalanda, P. and Hadjsaid, N.},
  booktitle = {{27th International Conference on Electricity Distribution (CIRED 2023)}},
  hal_version = {v1},
  hal_id = {hal-04479257},
  pdf = {https://hal.science/hal-04479257v1/file/AI-based%20controller%20for%20grid-forming%20inverter-based%20generators%20under%20extreme%20dynamics.pdf},
  keywords = {Grid-forming ; AI-based control ; Power electronics ; Power systems},
  doi = {10.1049/icp.2023.0701},
  month = {June},
  year = {2023},
  pages = {1305-1309},
  publisher = {{Institution of Engineering and Technology}},
  editor = {IEEE},
  address = {Rome, Italy},
  url = {https://hal.science/hal-04479257},
  abstract = {This paper presents two artificial intelligence (AI)-based controllers for grid-forming inverter-based generators in a simplified microgrid. The supervised learning approach was considered for the AI-based controls. The training datasets were collected from an experimentally validated virtual synchronous generator controller. The first AI model is a convolutional neural network (CNN), and the second is a gated recurrent unit (GRU). These two neural network types are chosen as they handle short-term temporal sequence data, provided as a sliding window of measurements. The proposed controllers are tested under a black start, load variations, and a three-phase-to-ground short circuit at the inverter's output. Both controllers manage to achieve the black start of the inverter, supply the power demanded by the load, damp the short circuit current during a fault occurrence and recover afterward.},
}This paper presents two artificial intelligence (AI)-based controllers for grid-forming inverter-based generators in a simplified microgrid. The supervised learning approach was considered for the AI-based controls. The training datasets were collected from an experimentally validated virtual synchronous generator controller. The first AI model is a convolutional neural network (CNN), and the second is a gated recurrent unit (GRU). These two neural network types are chosen as they handle short-term temporal sequence data, provided as a sliding window of measurements. The proposed controllers are tested under a black start, load variations, and a three-phase-to-ground short circuit at the inverter's output. Both controllers manage to achieve the black start of the inverter, supply the power demanded by the load, damp the short circuit current during a fault occurrence and recover afterward. 
- Proceedings DECISION SUPPORT TOOL FOR THE DEVELOPMENT OF POWER DISTRIBUTION NETWORKS BASED ON AI PLANNING
 S. Castellanos, M. Alvarez-Herault and P. Lalanda
 27th International Conference on Electricity Distribution (CIRED 2023), ROME, Italy, June 2023
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{castellanos:hal-04287537,
  title = {{DECISION SUPPORT TOOL FOR THE DEVELOPMENT OF POWER DISTRIBUTION NETWORKS BASED ON AI PLANNING}},
  author = {Castellanos, Sandra and Alvarez-Herault, Marie-C{\'e}cile and Lalanda, Philippe},
  booktitle = {{27th International Conference on Electricity Distribution (CIRED 2023)}},
  hal_version = {v1},
  hal_id = {hal-04287537},
  pdf = {https://hal.science/hal-04287537v1/file/10575-Castellanos.pdf},
  month = {June},
  year = {2023},
  address = {ROME, Italy},
  url = {https://hal.science/hal-04287537},
  abstract = {Long-term planning of interconnected electrical distribution networks consists in imagining the evolution of the existing network towards a target network, respecting the objectives and constraints set by the DSO, according to national and local energy evolution scenarios and with a time horizon of several decades. At present, there is no tool that can automatically and optimally propose target networks, as well as the annual intermediate networks that would allow reaching them. Automated planning (AI planning), a sub-field of Artificial Intelligence, is a model-based approach to action selection. The objective of this paper is a first attempt of a decision support tool for DSOs, based on AI planning and electrical computations, to help them finding the aforementioned intermediate networks.},
}Long-term planning of interconnected electrical distribution networks consists in imagining the evolution of the existing network towards a target network, respecting the objectives and constraints set by the DSO, according to national and local energy evolution scenarios and with a time horizon of several decades. At present, there is no tool that can automatically and optimally propose target networks, as well as the annual intermediate networks that would allow reaching them. Automated planning (AI planning), a sub-field of Artificial Intelligence, is a model-based approach to action selection. The objective of this paper is a first attempt of a decision support tool for DSOs, based on AI planning and electrical computations, to help them finding the aforementioned intermediate networks. 
- Proceedings Evaluation of Regularization-based Continual Learning Approaches: Application to HAR
 B. Kann, S. Castellanos-Paez and P. Lalanda
 2023 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), Atlanta, United States, March 2023
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{kann:hal-04080925,
  title = {{Evaluation of Regularization-based Continual Learning Approaches: Application to HAR}},
  author = {Kann, Bonpagna and Castellanos-Paez, Sandra and Lalanda, Philippe},
  booktitle = {{2023 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops)}},
  hal_version = {v1},
  hal_id = {hal-04080925},
  pdf = {https://hal.univ-grenoble-alpes.fr/hal-04080925v1/file/Evaluation%20of%20Regularization-based%20Continual%20Learning%20Approaches%20Application%20to%20HAR.pdf},
  keywords = {Continual learning ; regularization methods ; HAR},
  month = {March},
  year = {2023},
  address = {Atlanta, United States},
  url = {https://hal.univ-grenoble-alpes.fr/hal-04080925},
  abstract = {Pervasive computing allows the provision of services in many important areas, including the relevant and dynamic field of health and well-being. In this domain, Human Activity Recognition (HAR) has gained a lot of attention in recent years. Current solutions rely on Machine Learning (ML) models and achieve impressive results. However, the evolution of these models remains difficult, as long as a complete retraining is not performed. To overcome this problem, the concept of Continual Learning is very promising today and, more particularly, the techniques based on regularization. These techniques are particularly interesting for their simplicity and their low cost. Initial studies have been conducted and have shown promising outcomes. However, they remain very specific and difficult to compare. In this paper, we provide a comprehensive comparison of three regularization-based methods that we adapted to the HAR domain, highlighting their strengths and limitations. Our experiments were conducted on the UCI HAR dataset and the results showed that no single technique outperformed all others in all scenarios considered.},
}Pervasive computing allows the provision of services in many important areas, including the relevant and dynamic field of health and well-being. In this domain, Human Activity Recognition (HAR) has gained a lot of attention in recent years. Current solutions rely on Machine Learning (ML) models and achieve impressive results. However, the evolution of these models remains difficult, as long as a complete retraining is not performed. To overcome this problem, the concept of Continual Learning is very promising today and, more particularly, the techniques based on regularization. These techniques are particularly interesting for their simplicity and their low cost. Initial studies have been conducted and have shown promising outcomes. However, they remain very specific and difficult to compare. In this paper, we provide a comprehensive comparison of three regularization-based methods that we adapted to the HAR domain, highlighting their strengths and limitations. Our experiments were conducted on the UCI HAR dataset and the results showed that no single technique outperformed all others in all scenarios considered. 
- Journal Evaluation and comparison of federated learning algorithms for Human Activity Recognition on smartphones
 S. Ek, F. Portet, P. Lalanda and G. Vega
 Pervasive and Mobile Computing, vol. 87, December 2022
  DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{ek:hal-03930017,
  title = {{Evaluation and comparison of federated learning algorithms for Human Activity Recognition on smartphones}},
  author = {Ek, Sannara and Portet, Fran{\c c}ois and Lalanda, Philippe and Vega, German},
  journal = {{Pervasive and Mobile Computing}},
  hal_version = {v1},
  hal_id = {hal-03930017},
  doi = {10.1016/j.pmcj.2022.101714},
  month = {December},
  year = {2022},
  volume = {87},
  publisher = {{Elsevier}},
  url = {https://hal.science/hal-03930017},
  abstract = {Pervasive computing promotes the integration of smart devices in our living spaces to develop services providing assistance to people. Such smart devices are increasingly relying on cloud-based Machine Learning, which raises questions in terms of security (data privacy), reliance (latency), and communication costs. In this context, Federated Learning (FL) has been introduced as a new machine learning paradigm enhancing the use of local devices. At the server level, FL aggregates models learned locally on distributed clients to obtain a more general model. In this way, no private data is sent over the network, and the communication cost is reduced. Unfortunately, however, the most popular federated learning algorithms have been shown not to be adapted to some highly heterogeneous pervasive computing environments. In this paper, we propose a new FL algorithm, termed FedDist, which can modify models (here, deep neural network) during training by identifying dissimilarities between neurons among the clients. This permits to account for clients’ specificity without impairing generalization. FedDist evaluated with three state-of-the-art federated learning algorithms on three large heterogeneous mobile Human Activity Recognition datasets. Results have shown the ability of FedDist to adapt to heterogeneous data and the capability of FL to deal with asynchronous situations.},
}Pervasive computing promotes the integration of smart devices in our living spaces to develop services providing assistance to people. Such smart devices are increasingly relying on cloud-based Machine Learning, which raises questions in terms of security (data privacy), reliance (latency), and communication costs. In this context, Federated Learning (FL) has been introduced as a new machine learning paradigm enhancing the use of local devices. At the server level, FL aggregates models learned locally on distributed clients to obtain a more general model. In this way, no private data is sent over the network, and the communication cost is reduced. Unfortunately, however, the most popular federated learning algorithms have been shown not to be adapted to some highly heterogeneous pervasive computing environments. In this paper, we propose a new FL algorithm, termed FedDist, which can modify models (here, deep neural network) during training by identifying dissimilarities between neurons among the clients. This permits to account for clients’ specificity without impairing generalization. FedDist evaluated with three state-of-the-art federated learning algorithms on three large heterogeneous mobile Human Activity Recognition datasets. Results have shown the ability of FedDist to adapt to heterogeneous data and the capability of FL to deal with asynchronous situations. 
- Proceedings Preliminary Study on SSCF-derived Polar Coordinate for ASR
 S. Leang, E. Castelli, D. Vaufreydaz and S. Sam
 ACET 2022, Phnom Penh, Cambodia, December 2022
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{leang:hal-03871289,
  title = {{Preliminary Study on SSCF-derived Polar Coordinate for ASR}},
  author = {Leang, Sotheara and Castelli, Eric and Vaufreydaz, Dominique and Sam, Sethserey},
  booktitle = {{ACET 2022}},
  hal_version = {v1},
  hal_id = {hal-03871289},
  pdf = {https://hal.science/hal-03871289v1/file/main.pdf},
  keywords = {Automatic Speech Recognition ; Spectral Subband Centroid Frequency ; Speaker Normalization},
  month = {December},
  year = {2022},
  address = {Phnom Penh, Cambodia},
  url = {https://hal.science/hal-03871289},
  abstract = {The transition angles are defined to describe the vowel-to-vowel transitions in the acoustic space of the Spectral Subband Centroids, and the findings show that they are similar among speakers and speaking rates. In this paper, we propose to investigate the usage of polar coordinates in favor of angles to describe a speech signal by characterizing its acoustic trajectory and using them in Automatic Speech Recognition. According to the experimental results evaluated on the BRAF100 dataset, the polar coordinates achieved significantly higher accuracy than the angles in the mixed and cross-gender speech recognitions, demonstrating that these representations are superior at defining the acoustic trajectory of the speech signal. Furthermore, the accuracy was significantly improved when they were utilized with their first and second-order derivatives (∆, ∆∆), especially in cross-female recognition. However, the results showed they were not much more gender-independent than the conventional Mel-frequency Cepstral Coefficients (MFCCs).},
}The transition angles are defined to describe the vowel-to-vowel transitions in the acoustic space of the Spectral Subband Centroids, and the findings show that they are similar among speakers and speaking rates. In this paper, we propose to investigate the usage of polar coordinates in favor of angles to describe a speech signal by characterizing its acoustic trajectory and using them in Automatic Speech Recognition. According to the experimental results evaluated on the BRAF100 dataset, the polar coordinates achieved significantly higher accuracy than the angles in the mixed and cross-gender speech recognitions, demonstrating that these representations are superior at defining the acoustic trajectory of the speech signal. Furthermore, the accuracy was significantly improved when they were utilized with their first and second-order derivatives (∆, ∆∆), especially in cross-female recognition. However, the results showed they were not much more gender-independent than the conventional Mel-frequency Cepstral Coefficients (MFCCs). 
- Proceedings Local and Global Orientation Correction for Oriented Human (Pose) Detection
 R. Ly, D. Vaufreydaz, E. Castelli and S. Sam
 The ASEAN Conference On Emerging Technologies 2022, Phnom Penh, Cambodia, December 2022
  HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{ly:hal-03857012,
  title = {{Local and Global Orientation Correction for Oriented Human (Pose) Detection}},
  author = {Ly, Rottana and Vaufreydaz, Dominique and Castelli, Eric and Sam, Sethserey},
  booktitle = {{The ASEAN Conference On Emerging Technologies 2022}},
  hal_version = {v1},
  hal_id = {hal-03857012},
  keywords = {human detection ; pose detection ; orientation correction ; rotation invariance},
  month = {December},
  year = {2022},
  address = {Phnom Penh, Cambodia},
  url = {https://hal.science/hal-03857012},
  abstract = {Detecting people and/or their pose in real life condition is of interest for several tasks notably for home care for elderly or frail people for instance. In such contexts, the perception system must be able to detect usual but also unusual poses of people in various adversarial conditions. Even if the performance of neural networks on human (pose) detection has significantly increased recently, the human detection in different poses or positions, with partial occlusions, and at multiple scales remains a challenge. In this research, a step towards rotation invariance of human detection is proposed, i.e. to identify a person in a robust way on images containing rotated or oriented human poses. After confirming that data augmentation could not solve this problem, this research explores three ways to address the rotation problem in human pose detection: steerable networks, global rotation correction, and local person orientation approaches. From reported experiments on rotated generated corpora from COCO dataset, there is no significant improvement from integrating steerable approaches into existing architectures. While the state-of-the-art approaches lose up to 67.2 mAP on rotated images, the global rotation correction keeps almost intact the performance for all angles but does not solve the problem of images crowded with several people at various orientations. The local orientation approach permits an average of 4.3 mAP[.5:.95] and 9.5 mAP0.5 gain, and even more when combined with the global approach. The reported experiments indicate that it is possible to achieve rotation invariance. The paper ends by discussing possible improvements to strengthen the rotation invariance in perceiving humans.},
}Detecting people and/or their pose in real life condition is of interest for several tasks notably for home care for elderly or frail people for instance. In such contexts, the perception system must be able to detect usual but also unusual poses of people in various adversarial conditions. Even if the performance of neural networks on human (pose) detection has significantly increased recently, the human detection in different poses or positions, with partial occlusions, and at multiple scales remains a challenge. In this research, a step towards rotation invariance of human detection is proposed, i.e. to identify a person in a robust way on images containing rotated or oriented human poses. After confirming that data augmentation could not solve this problem, this research explores three ways to address the rotation problem in human pose detection: steerable networks, global rotation correction, and local person orientation approaches. From reported experiments on rotated generated corpora from COCO dataset, there is no significant improvement from integrating steerable approaches into existing architectures. While the state-of-the-art approaches lose up to 67.2 mAP on rotated images, the global rotation correction keeps almost intact the performance for all angles but does not solve the problem of images crowded with several people at various orientations. The local orientation approach permits an average of 4.3 mAP[.5:.95] and 9.5 mAP0.5 gain, and even more when combined with the global approach. The reported experiments indicate that it is possible to achieve rotation invariance. The paper ends by discussing possible improvements to strengthen the rotation invariance in perceiving humans. 
- Proceedings Convolutional Time Delay Neural Network for Khmer Automatic Speech Recognition
 N. Srun, S. Leang, Y. Kyaw and S. Sam
 iSAI-NLP-AIoT 2022, Chiang Mai, Thailand, November 2022
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{srun:hal-03865538,
  title = {{Convolutional Time Delay Neural Network for Khmer Automatic Speech Recognition}},
  author = {Srun, Nalin and Leang, Sotheara and Kyaw, Ye and Sam, Sethserey},
  booktitle = {{iSAI-NLP-AIoT 2022}},
  hal_version = {v1},
  hal_id = {hal-03865538},
  pdf = {https://hal.univ-grenoble-alpes.fr/hal-03865538v1/file/main.pdf},
  keywords = {Khmer ASR ; Time Delay Neural Network ; Convolutional Neural Network ; Low-resource Language},
  month = {November},
  year = {2022},
  address = {Chiang Mai, Thailand},
  url = {https://hal.univ-grenoble-alpes.fr/hal-03865538},
  abstract = {Convolutional Neural Networks have been proven to successfully capture spatial aspects of the speech signal and eliminate spectral variations across speakers for Automatic Speech Recognition. In this study, we investigate the Convolutional Neural Network with Time Delay Neural Network for an acoustic model to deal with large vocabulary continuous speech recognition for Khmer. Our idea is to use Convolutional Neural Networks to extract local features of the speech signal, whereas Time Delay Neural Networks capture long temporal correlations between acoustic events. The experimental results show that the suggested network outperforms the Time Delay Neural Network and achieves an average relative improvement of 14% across test sets.},
}Convolutional Neural Networks have been proven to successfully capture spatial aspects of the speech signal and eliminate spectral variations across speakers for Automatic Speech Recognition. In this study, we investigate the Convolutional Neural Network with Time Delay Neural Network for an acoustic model to deal with large vocabulary continuous speech recognition for Khmer. Our idea is to use Convolutional Neural Networks to extract local features of the speech signal, whereas Time Delay Neural Networks capture long temporal correlations between acoustic events. The experimental results show that the suggested network outperforms the Time Delay Neural Network and achieves an average relative improvement of 14% across test sets. 
- Proceedings Emotion Recognition with Pre-Trained Transformers Using Multimodal Signals
 J. Vazquez-Rodriguez, G. Lefebvre, J. Cumin and J. L. Crowley
 ACII 2022 – 10th International Conference on Affective Computing and Intelligent Interaction, Nara, Japan, October 2022
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{vazquezrodriguez:hal-03897400,
  title = {{Emotion Recognition with Pre-Trained Transformers Using Multimodal Signals}},
  author = {Vazquez-Rodriguez, Juan and Lefebvre, Gr{\'e}goire and Cumin, Julien and Crowley, James L},
  booktitle = {{ACII 2022 - 10th International Conference on Affective Computing and Intelligent Interaction}},
  hal_version = {v1},
  hal_id = {hal-03897400},
  pdf = {https://hal.science/hal-03897400v1/file/MultEmoRecon.pdf},
  keywords = {Machine Learning ; Multimodal Emotion Recognition ; Affective Computing ; Affective Computing Multimodal Emotion Recognition Machine Learning},
  month = {October},
  year = {2022},
  address = {Nara, Japan},
  url = {https://hal.science/hal-03897400},
  abstract = {In this paper, we address the problem of multimodal emotion recognition from multiple physiological signals. We demonstrate that a Transformer-based approach is suitable for this task. In addition, we present how such models may be pretrained in a multimodal scenario to improve emotion recognition performances. We evaluate the benefits of using multimodal inputs and pre-training with our approach on a state-ofthe-art dataset.},
}In this paper, we address the problem of multimodal emotion recognition from multiple physiological signals. We demonstrate that a Transformer-based approach is suitable for this task. In addition, we present how such models may be pretrained in a multimodal scenario to improve emotion recognition performances. We evaluate the benefits of using multimodal inputs and pre-training with our approach on a state-ofthe-art dataset. 
- Proceedings Multimodal Perception and Statistical Modeling of Pedagogical Classroom Events Using a Privacy-safe Non-individual Approach
 A. Augusma
 Doctoral Consortium of 10th International Conference on Affective Computing & Intelligent Interaction (ACII 2022), Nara, Japan, October 2022
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{augusma:hal-03886510,
  title = {{Multimodal Perception and Statistical Modeling of Pedagogical Classroom Events Using a Privacy-safe Non-individual Approach}},
  author = {Augusma, Anderson},
  booktitle = {{Doctoral Consortium of 10th International Conference on Affective Computing \& Intelligent Interaction (ACII 2022)}},
  hal_version = {v1},
  hal_id = {hal-03886510},
  pdf = {https://hal.science/hal-03886510v1/file/DC2022_paper_ArXiv%20%283%29.pdf},
  keywords = {Multi-modal ; Context-Aware Classroom (CAC) ; Deep Learning ; Attention-level ; Privacy-safe processing ; Statistical modeling},
  month = {October},
  year = {2022},
  organization = {{Association for the Advancement of Affective Computing (AAAC)}},
  address = {Nara, Japan},
  url = {https://hal.science/hal-03886510},
  abstract = {Interactions between humans are greatly impacted by their behavior. These behaviors can be characterized by signals such as smiling, speech, gaze, posture, gesture, etc. Also by the space, surroundings, time, situation, and context created for a particular activity. These signals also define emotion since they are reactions that human beings experience in response to a particular event or situation. Depending on the event or the circumstance, most of these signals can be triggered. That also happens in pedagogical activities in a classroom. Social learning is multi-modal and teaching itself is complex, these underlying cues are not entirely visible and not immediate. We are investigating Context-Aware Classroom (CAC) to provide a multi-modal perception system allowing to capture pedagogical events that occur in it, to help (young) teachers improve their teaching practices. Thanks to deep learning, which has made great progress over the past two decades, and statistical modeling, it is possible to extract and analyze the signals mentioned above to characterize these events. The main problem with this investigation is the fact that the privacy of the participants may not be preserved. From an ethical point of view, a lot of problems can be caused, i.e, privacy must be taken into account when designing artificial intelligence models. Thus, instead of monitoring individual behavior, the focus will be on global emotion, global student engagement, and the global attention level of the whole class using the signals above mentioned.},
}Interactions between humans are greatly impacted by their behavior. These behaviors can be characterized by signals such as smiling, speech, gaze, posture, gesture, etc. Also by the space, surroundings, time, situation, and context created for a particular activity. These signals also define emotion since they are reactions that human beings experience in response to a particular event or situation. Depending on the event or the circumstance, most of these signals can be triggered. That also happens in pedagogical activities in a classroom. Social learning is multi-modal and teaching itself is complex, these underlying cues are not entirely visible and not immediate. We are investigating Context-Aware Classroom (CAC) to provide a multi-modal perception system allowing to capture pedagogical events that occur in it, to help (young) teachers improve their teaching practices. Thanks to deep learning, which has made great progress over the past two decades, and statistical modeling, it is possible to extract and analyze the signals mentioned above to characterize these events. The main problem with this investigation is the fact that the privacy of the participants may not be preserved. From an ethical point of view, a lot of problems can be caused, i.e, privacy must be taken into account when designing artificial intelligence models. Thus, instead of monitoring individual behavior, the focus will be on global emotion, global student engagement, and the global attention level of the whole class using the signals above mentioned. 
- Journal A Hierarchical Framework for Collaborative Artificial Intelligence
 J. L. Crowley, J. L. Coutaz, J. Grosinger, J. Vázquez-Salceda, C. Angulo, A. Sanfeliu, L. Iocchi and A. G. Cohn
 IEEE Pervasive Computing, vol. 22, no. 1, pp. 9-18, March 2023
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{crowley:hal-03895933,
  title = {{A Hierarchical Framework for Collaborative Artificial Intelligence}},
  author = {Crowley, James L. and Coutaz, Jo{\"e}lle L and Grosinger, Jasmin and V{\'a}zquez-Salceda, Javier and Angulo, Cecilio and Sanfeliu, Alberto and Iocchi, Luca and Cohn, Anthony G.},
  journal = {{IEEE Pervasive Computing}},
  hal_version = {v1},
  hal_id = {hal-03895933},
  pdf = {https://hal.univ-grenoble-alpes.fr/hal-03895933v1/file/CollaborativeIntelligentSystems-Nov2022.pdf},
  doi = {10.1109/MPRV.2022.3208321},
  month = {March},
  year = {2023},
  pages = {9-18},
  number = {1},
  volume = {22},
  publisher = {{Institute of Electrical and Electronics Engineers}},
  url = {https://hal.univ-grenoble-alpes.fr/hal-03895933},
  abstract = {We propose a hierarchical framework for collaborative intelligent systems. This framework organizes research challenges based on the nature of the collaborative activity and the information that must be shared, with each level building on capabilities provided by lower levels. We review research paradigms at each level, with a description of classical engineering-based approaches and modern alternatives based on machine learning, illustrated with a running example using a hypothetical personal service robot. We discuss cross-cutting issues that occur at all levels, focusing on the problem of communicating and sharing comprehension, the role of explanation and the social nature of collaboration. We conclude with a summary of research challenges and a discussion of the potential for economic and societal impact provided by technologies that enhance human abilities and empower people and society through collaboration with Intelligent Systems.},
}We propose a hierarchical framework for collaborative intelligent systems. This framework organizes research challenges based on the nature of the collaborative activity and the information that must be shared, with each level building on capabilities provided by lower levels. We review research paradigms at each level, with a description of classical engineering-based approaches and modern alternatives based on machine learning, illustrated with a running example using a hypothetical personal service robot. We discuss cross-cutting issues that occur at all levels, focusing on the problem of communicating and sharing comprehension, the role of explanation and the social nature of collaboration. We conclude with a summary of research challenges and a discussion of the potential for economic and societal impact provided by technologies that enhance human abilities and empower people and society through collaboration with Intelligent Systems. 
- proceedings Context-Aware Classrooms as Places for an Automated Analysis of Instructional Events
 P. Dessus
 Conference on Smart Learning Ecosystems and Regional Development, pp. 1-12, Bucharest, Romania, September 2022
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@proceedings{dessus:hal-03790474,
  title = {{Context-Aware Classrooms as Places for an Automated Analysis of Instructional Events}},
  author = {Dessus, Philippe},
  booktitle = {{Conference on Smart Learning Ecosystems and Regional Development}},
  hal_version = {v1},
  hal_id = {hal-03790474},
  pdf = {https://hal.univ-grenoble-alpes.fr/hal-03790474v1/file/paper-3-v-pub.pdf},
  keywords = {Context-Aware Classrooms ; Observation systems ; Behaviorism ; Ecology ; Enactivism},
  doi = {10.1007/978-981-19-5240-1\_1},
  month = {September},
  year = {2022},
  pages = {1-12},
  publisher = {{Springer}},
  address = {Bucharest, Romania},
  url = {https://hal.univ-grenoble-alpes.fr/hal-03790474},
  abstract = {Context-Aware Classrooms (CACs), or ambient classrooms, are places in which instructional events can be captured and analyzed, thanks to advanced signal processing techniques. For CACs to be used for a better understanding of the educational events (teaching or learning), theoretically grounded approaches have to be reviewed and their main variables of interest presented. In this paper, three types of approaches to study the use of CACs (behavioral, ecological, and enactivist) are discussed, first theoretically, then about what each approach brings to the research on educational research. Some implications to build more ecologically-sound in presence or hybrid instructional sessions after the COVID-19 are drawn.},
}Context-Aware Classrooms (CACs), or ambient classrooms, are places in which instructional events can be captured and analyzed, thanks to advanced signal processing techniques. For CACs to be used for a better understanding of the educational events (teaching or learning), theoretically grounded approaches have to be reviewed and their main variables of interest presented. In this paper, three types of approaches to study the use of CACs (behavioral, ecological, and enactivist) are discussed, first theoretically, then about what each approach brings to the research on educational research. Some implications to build more ecologically-sound in presence or hybrid instructional sessions after the COVID-19 are drawn. 
- Proceedings Transformer-Based Self-Supervised Learning for Emotion Recognition
 J. Vazquez-Rodriguez, G. Lefebvre, J. Cumin and J. L. Crowley
 26th International Conference on Pattern Recognition (ICPR 2022), Montreal, Canada, August 2022
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{vazquezrodriguez:hal-03634490,
  title = {{Transformer-Based Self-Supervised Learning for Emotion Recognition}},
  author = {Vazquez-Rodriguez, Juan and Lefebvre, Gr{\'e}goire and Cumin, Julien and Crowley, James L.},
  booktitle = {{26th International Conference on Pattern Recognition (ICPR 2022)}},
  hal_version = {v2},
  hal_id = {hal-03634490},
  pdf = {https://hal.science/hal-03634490v2/file/Trnsf_based_EmoRec.pdf},
  keywords = {Emotion recognition ; Machine learning ; Unsupervised learning ; Transformers},
  month = {August},
  year = {2022},
  address = {Montreal, Canada},
  url = {https://hal.science/hal-03634490},
  abstract = {In order to exploit representations of time-series signals, such as physiological signals, it is essential that these representations capture relevant information from the whole signal. In this work, we propose to use a Transformer-based model to process electrocardiograms (ECG) for emotion recognition. Attention mechanisms of the Transformer can be used to build contextualized representations for a signal, giving more importance to relevant parts. These representations may then be processed with a fully-connected network to predict emotions. To overcome the relatively small size of datasets with emotional labels, we employ self-supervised learning. We gathered several ECG datasets with no labels of emotion to pre-train our model, which we then fine-tuned for emotion recognition on the AMIGOS dataset. We show that our approach reaches state-of-the-art performances for emotion recognition using ECG signals on AMIGOS. More generally, our experiments show that transformers and pre-training are promising strategies for emotion recognition with physiological signals.},
}In order to exploit representations of time-series signals, such as physiological signals, it is essential that these representations capture relevant information from the whole signal. In this work, we propose to use a Transformer-based model to process electrocardiograms (ECG) for emotion recognition. Attention mechanisms of the Transformer can be used to build contextualized representations for a signal, giving more importance to relevant parts. These representations may then be processed with a fully-connected network to predict emotions. To overcome the relatively small size of datasets with emotional labels, we employ self-supervised learning. We gathered several ECG datasets with no labels of emotion to pre-train our model, which we then fine-tuned for emotion recognition on the AMIGOS dataset. We show that our approach reaches state-of-the-art performances for emotion recognition using ECG signals on AMIGOS. More generally, our experiments show that transformers and pre-training are promising strategies for emotion recognition with physiological signals. 
- Journal UsyBus: A Communication Framework among Reusable Agents integrating Eye-Tracking in Interactive Applications
 F. Jambon and J. Vanderdonckt
 Proceedings of the ACM on Human-Computer Interaction , vol. 6, no. EICS, pp. 1-36, June 2022
  DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{jambon:hal-03698671,
  title = {{UsyBus: A Communication Framework among Reusable Agents integrating Eye-Tracking in Interactive Applications}},
  author = {Jambon, Francis and Vanderdonckt, Jean},
  journal = {{Proceedings of the ACM on Human-Computer Interaction }},
  hal_version = {v1},
  hal_id = {hal-03698671},
  keywords = {Communication data bus ; Eye movement analysis ; Oculometry ; Eye-tracker ; Eye-tracking ; Usability engineering},
  doi = {10.1145/3532207},
  month = {June},
  year = {2022},
  pages = {1-36},
  number = {EICS},
  volume = {6},
  publisher = {{Association for Computing Machinery (ACM)}},
  note = {Article No.: 157},
  url = {https://hal.science/hal-03698671},
  abstract = {Eye movement analysis is a popular method to evaluate whether a user interface meets the users' requirements and abilities. However, with current tools, setting up a usability evaluation with an eye-tracker is resource-consuming, since the areas of interest are defined manually, exhaustively and redefined each time the user interface changes. This process is also error-prone, since eye movement data must be finely synchronised with user interface changes. These issues become more serious when the user interface layout changes dynamically in response to user actions. In addition, current tools do not allow easy integration into interactive applications, and opportunistic code must be written to link these tools to user interfaces. To address these shortcomings and to leverage the capabilities of eye-tracking, we present UsyBus, a communication framework for autonomous, tight coupling among reusable agents. These agents are responsible for collecting data from eye-trackers, analyzing eye movements, and managing communication with other modules of an interactive application. UsyBus allows multiple heterogeneous eye-trackers as input, provides multiple configurable outputs depending on the data to be exploited. Modules exchange data based on the UsyBus communication framework, thus creating a customizable multi-agent architecture. UsyBus application domains range from usability evaluation to gaze interaction applications design. Two case studies, composed of reusable modules from our portfolio, exemplify the implementation of the UsyBus framework.},
}Eye movement analysis is a popular method to evaluate whether a user interface meets the users' requirements and abilities. However, with current tools, setting up a usability evaluation with an eye-tracker is resource-consuming, since the areas of interest are defined manually, exhaustively and redefined each time the user interface changes. This process is also error-prone, since eye movement data must be finely synchronised with user interface changes. These issues become more serious when the user interface layout changes dynamically in response to user actions. In addition, current tools do not allow easy integration into interactive applications, and opportunistic code must be written to link these tools to user interfaces. To address these shortcomings and to leverage the capabilities of eye-tracking, we present UsyBus, a communication framework for autonomous, tight coupling among reusable agents. These agents are responsible for collecting data from eye-trackers, analyzing eye movements, and managing communication with other modules of an interactive application. UsyBus allows multiple heterogeneous eye-trackers as input, provides multiple configurable outputs depending on the data to be exploited. Modules exchange data based on the UsyBus communication framework, thus creating a customizable multi-agent architecture. UsyBus application domains range from usability evaluation to gaze interaction applications design. Two case studies, composed of reusable modules from our portfolio, exemplify the implementation of the UsyBus framework. 
- Proceedings Federated Continual Learning through distillation in pervasive computing
 A. Usmanova, F. Portet, P. Lalanda and G. Vega
 SMARTCOMP2022, pp. 86 — 91, Espoo, Finland, June 2022
  HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{usmanova:hal-03727252,
  title = {{Federated Continual Learning through distillation in pervasive computing}},
  author = {Usmanova, Anastasiia and Portet, Fran{\c c}ois and Lalanda, Philippe and Vega, German},
  booktitle = {{SMARTCOMP2022}},
  hal_version = {v1},
  hal_id = {hal-03727252},
  month = {June},
  year = {2022},
  pages = {86 -- 91},
  address = {Espoo, Finland},
  url = {https://hal.science/hal-03727252},
  abstract = {Federated Learning has been introduced as a new machine learning paradigm enhancing the use of local devices. At a server level, FL regularly aggregates models learned locally on distributed clients to obtain a more general model. Current solutions rely on the availability of large amounts of stored data at the client side in order to fine-tune the models sent by the server. Such setting is not realistic in mobile pervasive computing where data storage must be kept low and data characteristic can change dramatically. To account for this variability, a solution is to use the data regularly collected by the client to progressively adapt the received model. But such naive approach exposes clients to the well-known problem of catastrophic forgetting. To address this problem, we have defined a Federated Continual Learning approach which is mainly based on distillation. Our approach allows a better use of resources, eliminating the need to retrain from scratch at the arrival of new data and reducing memory usage by limiting the amount of data to be stored. This proposal has been evaluated in the Human Activity Recognition (HAR) domain and has shown to effectively reduce the catastrophic forgetting effect.},
}Federated Learning has been introduced as a new machine learning paradigm enhancing the use of local devices. At a server level, FL regularly aggregates models learned locally on distributed clients to obtain a more general model. Current solutions rely on the availability of large amounts of stored data at the client side in order to fine-tune the models sent by the server. Such setting is not realistic in mobile pervasive computing where data storage must be kept low and data characteristic can change dramatically. To account for this variability, a solution is to use the data regularly collected by the client to progressively adapt the received model. But such naive approach exposes clients to the well-known problem of catastrophic forgetting. To address this problem, we have defined a Federated Continual Learning approach which is mainly based on distillation. Our approach allows a better use of resources, eliminating the need to retrain from scratch at the arrival of new data and reducing memory usage by limiting the amount of data to be stored. This proposal has been evaluated in the Human Activity Recognition (HAR) domain and has shown to effectively reduce the catastrophic forgetting effect. 
- Proceedings Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut
 Y. Wang, X. Shen, S. Hu, Y. Yuan, J. L. Crowley and D. Vaufreydaz
 CVPR 2022 – Conference on Computer Vision and Pattern Recognition, New Orleans, United States, June 2022
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{wang:hal-03585410,
  title = {{Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut}},
  author = {Wang, Yangtao and Shen, Xi and Hu, Shell and Yuan, Yuan and Crowley, James L. and Vaufreydaz, Dominique},
  booktitle = {{CVPR 2022 - Conference on Computer Vision and Pattern Recognition}},
  hal_version = {v2},
  hal_id = {hal-03585410},
  pdf = {https://inria.hal.science/hal-03585410v2/file/TokenCut.pdf},
  keywords = {Object Discovery ; Unsupervised Learning ; Transformer},
  doi = {10.1109/cvpr52688.2022.01414},
  month = {June},
  year = {2022},
  address = {New Orleans, United States},
  url = {https://inria.hal.science/hal-03585410},
  abstract = {Transformers trained with self-supervised learning using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we demonstrate a graph-based approach that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connectivity score based on the similarity of tokens. Foreground objects can then be segmented using a normalized graph-cut to group self-similar regions. We solve the graph-cut problem using spectral clustering with generalized eigen-decomposition and show that the second smallest eigenvector provides a cutting solution since its absolute value indicates the likelihood that a token belongs to a foreground object. Despite its simplicity, this approach significantly boosts the performance of unsupervised object discovery: we improve over the recent state of the art LOST by a margin of 6.9%, 8.1%, and 8.1% respectively on the VOC07, VOC12, and COCO20K. The performance can be further improved by adding a second stage class-agnostic detector (CAD). Our proposed method can be easily extended to unsupervised saliency detection and weakly supervised object detection. For unsupervised saliency detection, we improve IoU for 4.9%, 5.2%, 12.9% on ECSSD, DUTS, DUT-OMRON respectively compared to previous state of the art. For weakly supervised object detection, we achieve competitive performance on CUB and ImageNet.},
}Transformers trained with self-supervised learning using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we demonstrate a graph-based approach that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connectivity score based on the similarity of tokens. Foreground objects can then be segmented using a normalized graph-cut to group self-similar regions. We solve the graph-cut problem using spectral clustering with generalized eigen-decomposition and show that the second smallest eigenvector provides a cutting solution since its absolute value indicates the likelihood that a token belongs to a foreground object. Despite its simplicity, this approach significantly boosts the performance of unsupervised object discovery: we improve over the recent state of the art LOST by a margin of 6.9%, 8.1%, and 8.1% respectively on the VOC07, VOC12, and COCO20K. The performance can be further improved by adding a second stage class-agnostic detector (CAD). Our proposed method can be easily extended to unsupervised saliency detection and weakly supervised object detection. For unsupervised saliency detection, we improve IoU for 4.9%, 5.2%, 12.9% on ECSSD, DUTS, DUT-OMRON respectively compared to previous state of the art. For weakly supervised object detection, we achieve competitive performance on CUB and ImageNet. 
- Journal SocialInteractionGAN: Multi-person Interaction Sequence Generation
 L. Airale, D. Vaufreydaz and X. Alameda-Pineda
 IEEE Transactions on Affective Computing, pp. 2182-2192, May 2022
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{airale:hal-03163467,
  title = {{SocialInteractionGAN: Multi-person Interaction Sequence Generation}},
  author = {Airale, Louis and Vaufreydaz, Dominique and Alameda-Pineda, Xavier},
  journal = {{IEEE Transactions on Affective Computing}},
  hal_version = {v2},
  hal_id = {hal-03163467},
  pdf = {https://inria.hal.science/hal-03163467v2/file/SocialInteractionGAN.pdf},
  keywords = {Multi-person interactions ; discrete sequence generation ; adversarial learning},
  doi = {10.1109/TAFFC.2022.3171719},
  month = {May},
  year = {2022},
  pages = {2182-2192},
  publisher = {{Institute of Electrical and Electronics Engineers}},
  url = {https://inria.hal.science/hal-03163467},
  abstract = {Prediction of human actions in social interactions has important applications in the design of social robots or artificial avatars. In this paper, we focus on a unimodal representation of interactions and propose to tackle interaction generation in a data-driven fashion. In particular, we model human interaction generation as a discrete multi-sequence generation problem and present SocialInteractionGAN, a novel adversarial architecture for conditional interaction generation. Our model builds on a recurrent encoder-decoder generator network and a dual-stream discriminator, that jointly evaluates the realism of interactions and individual action sequences and operates at different time scales. Crucially, contextual information on interacting participants is shared among agents and reinjected in both the generation and the discriminator evaluation processes. Experiments show that albeit dealing with low dimensional data, SocialInteractionGAN succeeds in producing high realism action sequences of interacting people, comparing favorably to a diversity of recurrent and convolutional discriminator baselines, and we argue that this work will constitute a first stone towards higher dimensional and multimodal interaction generation. Evaluations are conducted using classical GAN metrics, that we specifically adapt for discrete sequential data. Our model is shown to properly learn the dynamics of interaction sequences, while exploiting the full range of available actions.},
}Prediction of human actions in social interactions has important applications in the design of social robots or artificial avatars. In this paper, we focus on a unimodal representation of interactions and propose to tackle interaction generation in a data-driven fashion. In particular, we model human interaction generation as a discrete multi-sequence generation problem and present SocialInteractionGAN, a novel adversarial architecture for conditional interaction generation. Our model builds on a recurrent encoder-decoder generator network and a dual-stream discriminator, that jointly evaluates the realism of interactions and individual action sequences and operates at different time scales. Crucially, contextual information on interacting participants is shared among agents and reinjected in both the generation and the discriminator evaluation processes. Experiments show that albeit dealing with low dimensional data, SocialInteractionGAN succeeds in producing high realism action sequences of interacting people, comparing favorably to a diversity of recurrent and convolutional discriminator baselines, and we argue that this work will constitute a first stone towards higher dimensional and multimodal interaction generation. Evaluations are conducted using classical GAN metrics, that we specifically adapt for discrete sequential data. Our model is shown to properly learn the dynamics of interaction sequences, while exploiting the full range of available actions. 
- Proceedings Evaluation des systèmes de recherche d’information interactifs : exemple du Bouclage de Pertinence implicite par suivi oculaire
 P. Mulhem, F. Jambon and L. Albarede
 Journée Accès Interactif à l’Information, pp. 14-16, Paris, France, April 2022
  PDF PDF HAL[BibTeX] HAL[BibTeX]@inproceedings{mulhem:hal-03634132,
  title = {{Evaluation des syst{\`e}mes de recherche d'information interactifs : exemple du Bouclage de Pertinence implicite par suivi oculaire}},
  author = {Mulhem, Philippe and Jambon, Francis and Albarede, Lucas},
  booktitle = {{Journ{\'e}e Acc{\`e}s Interactif {\`a} l'Information}},
  hal_version = {v1},
  hal_id = {hal-03634132},
  pdf = {https://hal.science/hal-03634132v1/file/JAII_mulhem_jambon_albarede.pdf},
  month = {April},
  year = {2022},
  pages = {14-16},
  address = {Paris, France},
  url = {https://hal.science/hal-03634132},
  abstract = {},
}
- Proceedings Federated Self-Supervised Learning in Heterogeneous Settings: Limits of a Baseline Approach on HAR
 S. Ek, R. Rombourg, F. Portet and P. Lalanda
 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), pp. 557-562, Pisa, France, March 2022
  DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{ek:hal-03727244,
  title = {{Federated Self-Supervised Learning in Heterogeneous Settings: Limits of a Baseline Approach on HAR}},
  author = {Ek, Sannara and Rombourg, Romain and Portet, Fran{\c c}ois and Lalanda, Philippe},
  booktitle = {{2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops)}},
  hal_version = {v1},
  hal_id = {hal-03727244},
  doi = {10.1109/PerComWorkshops53856.2022.9767369},
  month = {March},
  year = {2022},
  pages = {557-562},
  publisher = {{IEEE}},
  address = {Pisa, France},
  note = {S. Ek, R. Rombourg, F. Portet and P. Lalanda, ''Federated Self-Supervised Learning in Heterogeneous Settings: Limits of a Baseline Approach on HAR,'' 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), 2022, pp. 557-562},
  url = {https://hal.science/hal-03727244},
  abstract = {Federated Learning is a new machine learning paradigm dealing with distributed model learning on independent devices. One of the many advantages of federated learning is that training data stay on devices (such as smartphones), and only learned models are shared with a centralized server. In the case of supervised learning, labeling is entrusted to the clients. However, acquiring such labels can be prohibitively expensive and error-prone for many tasks, such as human activity recognition. Hence, a wealth of data remains unlabelled and unexploited. Most existing federated learning approaches that focus mainly on supervised learning have mostly ignored this mass of unlabelled data. Furthermore, it is unclear whether standard federated Learning approaches are suited to self-supervised learning. The few studies that have dealt with the problem have limited themselves to the favorable situation of homogeneous datasets. This work lays the groundwork for a reference evaluation of federated Learning with Semi-Supervised Learning in a realistic setting. We show that standard lightweight autoencoder and standard Federated Averaging fail to learn a robust representation for Human Activity Recognition with several realistic heterogeneous datasets. These findings advocate for a more intensive research effort in Federated Self Supervised Learning to exploit the mass of heterogeneous unlabelled data present on mobile devices.},
}Federated Learning is a new machine learning paradigm dealing with distributed model learning on independent devices. One of the many advantages of federated learning is that training data stay on devices (such as smartphones), and only learned models are shared with a centralized server. In the case of supervised learning, labeling is entrusted to the clients. However, acquiring such labels can be prohibitively expensive and error-prone for many tasks, such as human activity recognition. Hence, a wealth of data remains unlabelled and unexploited. Most existing federated learning approaches that focus mainly on supervised learning have mostly ignored this mass of unlabelled data. Furthermore, it is unclear whether standard federated Learning approaches are suited to self-supervised learning. The few studies that have dealt with the problem have limited themselves to the favorable situation of homogeneous datasets. This work lays the groundwork for a reference evaluation of federated Learning with Semi-Supervised Learning in a realistic setting. We show that standard lightweight autoencoder and standard Federated Averaging fail to learn a robust representation for Human Activity Recognition with several realistic heterogeneous datasets. These findings advocate for a more intensive research effort in Federated Self Supervised Learning to exploit the mass of heterogeneous unlabelled data present on mobile devices. 
- Proceedings Federated Learning and catastrophic forgetting in pervasive computing: demonstration in HAR domain
 A. Usmanova, F. Portet, P. Lalanda and G. Vega
 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), pp. 310-315, Pisa, Italy, March 2022
  DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{usmanova:hal-03727275,
  title = {{Federated Learning and catastrophic forgetting in pervasive computing: demonstration in HAR domain}},
  author = {Usmanova, Anastasiia and Portet, Fran{\c c}ois and Lalanda, Philippe and Vega, German},
  booktitle = {{2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops)}},
  hal_version = {v1},
  hal_id = {hal-03727275},
  doi = {10.1109/PerComWorkshops53856.2022.9767246},
  month = {March},
  year = {2022},
  pages = {310-315},
  publisher = {{IEEE}},
  address = {Pisa, Italy},
  note = {A. Usmanova, F. Portet, P. Lalanda and G. Vega, ''Federated Learning and catastrophic forgetting in pervasive computing: demonstration in HAR domain,'' 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), 2022, pp. 310-315},
  url = {https://hal.science/hal-03727275},
  abstract = {Federated Learning has been introduced as a new machine learning paradigm enhancing the use of local devices. At a server level, FL regularly aggregates models learned locally on distributed clients to obtain a more general model. In this way, no private data is sent over the network, and the communication cost is reduced. However, current solutions rely on the availability of large amounts of stored data at the client side in order to fine-tune the models sent by the server. Such setting is not realistic in mobile pervasive computing where data storage must be kept low and data characteristic (distribution) can change dramatically. To account for this variability, a solution is to use the data regularly collected by the client to progressively adapt the received model. But such naive approach exposes clients to the well-known problem of catastrophic forgetting. The purpose of this paper is to demonstrate this problem in the mobile human activity recognition context on smartphones.},
}Federated Learning has been introduced as a new machine learning paradigm enhancing the use of local devices. At a server level, FL regularly aggregates models learned locally on distributed clients to obtain a more general model. In this way, no private data is sent over the network, and the communication cost is reduced. However, current solutions rely on the availability of large amounts of stored data at the client side in order to fine-tune the models sent by the server. Such setting is not realistic in mobile pervasive computing where data storage must be kept low and data characteristic (distribution) can change dramatically. To account for this variability, a solution is to use the data regularly collected by the client to progressively adapt the received model. But such naive approach exposes clients to the well-known problem of catastrophic forgetting. The purpose of this paper is to demonstrate this problem in the mobile human activity recognition context on smartphones. 
- Preprint Composing Complex and Hybrid AI Solutions
 P. Schüller, J. Paulo Costeira, J. L. Crowley, J. Grosinger, F. Ingrand, U. Köckemann, A. Saffiotti and M. Welss
 February 2022
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@unpublished{schuller:hal-03590739,
  title = {{Composing Complex and Hybrid AI Solutions}},
  author = {Sch{\"u}ller, Peter and Paulo Costeira, Jo{\~a}o and Crowley, James L. and Grosinger, Jasmin and Ingrand, F{\'e}lix and K{\"o}ckemann, Uwe and Saffiotti, Alessandro and Welss, Martin},
  hal_version = {v1},
  hal_id = {hal-03590739},
  pdf = {https://laas.hal.science/hal-03590739v1/file/2202.12566.pdf},
  month = {February},
  year = {2022},
  hal_local_reference = {Rapport LAAS n{\textdegree} 22041},
  note = {working paper or preprint},
  url = {https://laas.hal.science/hal-03590739},
  abstract = {Progress in several areas of computer science has been enabled by comfortable and efficient means of experimentation, clear interfaces, and interchangable components, for example using OpenCV for computer vision or ROS for robotics. We describe an extension of the Acumos system towards enabling the above features for general AI applications. Originally, Acumos was created for telecommunication purposes, mainly for creating linear pipelines of machine learning components. Our extensions include support for more generic components with gRPC/Protobuf interfaces, automatic orchestration of graphically assembled solutions including control loops, sub-component topologies, and event-based communication, and provisions for assembling solutions which contain user interfaces and shared storage areas. We provide examples of deployable solutions and their interfaces. The framework is deployed at http://aiexp.ai4europe.eu/ and its source code is managed as an open source Eclipse project.},
}Progress in several areas of computer science has been enabled by comfortable and efficient means of experimentation, clear interfaces, and interchangable components, for example using OpenCV for computer vision or ROS for robotics. We describe an extension of the Acumos system towards enabling the above features for general AI applications. Originally, Acumos was created for telecommunication purposes, mainly for creating linear pipelines of machine learning components. Our extensions include support for more generic components with gRPC/Protobuf interfaces, automatic orchestration of graphically assembled solutions including control loops, sub-component topologies, and event-based communication, and provisions for assembling solutions which contain user interfaces and shared storage areas. We provide examples of deployable solutions and their interfaces. The framework is deployed at http://aiexp.ai4europe.eu/ and its source code is managed as an open source Eclipse project. 
- Report Sur les traces de Mobi’Kids. L’enfant autonome au défi de la ville
 S. Depeau, I. I. André-Poyaud, N. Audas, H. Bailleul, O. Bedel, A. Boumoud, S. Chardonnel, P. Cherel, A. Desgans, T. Devogele, P. Dias, S. Duroudier, L. Etienne, B. Feildel, F. Jambon, F. Kasse Serigne, C. Kerouanton, M. Le Goc, M. Le Magadou, A. Lepetit, F. Leprince, T. Manola, J. Mcoisans, B. Mericskay, E. Moffat, C. Moreau, E. Ployon, N. Robinet, K. Tabaka and J. Thibaud
 pp. 75, 2022
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@techreport{depeau:hal-04284882,
  title = {{Sur les traces de Mobi'Kids. L'enfant autonome au d{\'e}fi de la ville}},
  author = {Depeau, Sandrine and Andr{\'e}-Poyaud, Isabelle I. and Audas, Nathalie and Bailleul, H{\'e}l{\`e}ne and Bedel, Olivier and Boumoud, Abdelhakim and Chardonnel, Sonia and Cherel, Pierre and Desgans, Aur{\`e}le and Devogele, Thomas and Dias, Pierre and Duroudier, Sylvestre and Etienne, Laurent and Feildel, Beno{\^i}t and Jambon, Francis and Kasse Serigne, Fallou and Kerouanton, Colin and Le Goc, Manon and Le Magadou, Ma{\"e}lys and Lepetit, Arnaud and Leprince, Fran{\c c}oise and Manola, Th{\'e}a and Mcoisans, Jul and Mericskay, Boris and Moffat, Eve and Moreau, Cl{\'e}ment and Ployon, Estelle and Robinet, Nicolas and Tabaka, Kamila and Thibaud, Jean-Paul},
  hal_version = {v1},
  hal_id = {hal-04284882},
  pdf = {https://hal.science/hal-04284882v1/file/2022_Livret%20Mobikids-VF.pdf},
  keywords = {enfant ; autonomie ; parcours ; apprentissage ; GPS ; espace urbain ; enqu{\^e}te ; mobilit{\'e}},
  year = {2022},
  institution = {{ESO (CNRS, Universit{\'e} Rennes2) ; PACTE (CNRS, UGA, Sciences Po Grenoble) ; AAU (CNRS, Centrale Nantes, ENSA Nantes et Grenoble, UGA) ; LIFAT (Universit{\'e} de Tours) ; (PME) ALKANTE ; PME, RF TRACK PME}},
  pages = {75},
  url = {https://hal.science/hal-04284882},
  abstract = {Que signifie être autonome en ville pour les enfants ? Comment les enfants développent-ils leurs rapports aux espaces de la ville ? Comment parcourentilsla ville ? Quelles différences de déplacements observe-t-on en fonction des contextes de vie ? Quel est le rôle des parents et des cultures éducatives dans cetapprentissage des espaces extérieurs ? Comment une équipe de scientifiques a-telle procédé pour enquêter, comprendre et analyser ses questions d’autonomie etde mobilités enfantines dans la métropole rennaise ? Quels dispositifs d’enquêtes et moyens techniques a-t-elle mis en oeuvre ? Combien de familles ont décidé departiciper à cette enquête au long cours ? Partez à la découverte du programme Mobi'kids.},
}Que signifie être autonome en ville pour les enfants ? Comment les enfants développent-ils leurs rapports aux espaces de la ville ? Comment parcourentilsla ville ? Quelles différences de déplacements observe-t-on en fonction des contextes de vie ? Quel est le rôle des parents et des cultures éducatives dans cetapprentissage des espaces extérieurs ? Comment une équipe de scientifiques a-telle procédé pour enquêter, comprendre et analyser ses questions d’autonomie etde mobilités enfantines dans la métropole rennaise ? Quels dispositifs d’enquêtes et moyens techniques a-t-elle mis en oeuvre ? Combien de familles ont décidé departiciper à cette enquête au long cours ? Partez à la découverte du programme Mobi'kids. 
- Journal L’instrumentation intelligente des salles de classe au service de l’observation des interactions enseignant-apprenants
 R. Laurent, P. Dessus and D. Vaufreydaz
 Revue internationale de communication et socialisation, vol. 9, no. 2, pp. 247-258, 2022
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{laurent:hal-03985556,
  title = {{L'instrumentation intelligente des salles de classe au service de l'observation des interactions enseignant-apprenants}},
  author = {Laurent, Romain and Dessus, Philippe and Vaufreydaz, Dominique},
  journal = {{Revue internationale de communication et socialisation}},
  hal_version = {v2},
  hal_id = {hal-03985556},
  pdf = {https://hal.univ-grenoble-alpes.fr/hal-03985556v2/file/Laurent%20et%20al-RICS_final-2.pdf},
  keywords = {computational observation ; Teacher-Student Relationship (TSR) ; classroom ecology ; Relation enseignant-apprenants ; observation computationnelle ; {\'e}cologie de la salle de classe},
  year = {2022},
  pages = {247-258},
  number = {2},
  volume = {9},
  publisher = {{J.C. Kalubi}},
  url = {https://hal.univ-grenoble-alpes.fr/hal-03985556},
  abstract = {The quality of the relationship between teacher and learners is a key factor in improving learning. If this relationship remains observable by several classical methods (self and hetero-reported), the recent introduction of computer vision in classrooms is likely to considerably increase the investigation of its interactional component, the most obvious and immediate dimension of relationships between learners and teachers (Pianta, 1999). However, the implementation of cameras feeding artificial intelligence processes in classrooms divides the scientific community. Between contemptible worried about data surveillance, and laudatory about the adaptive perspectives of a teacher informed in real time of the state, even hidden, of learners, it seems to us possible to draw a line of demarcation, on which the impact of such so-called computational instrumentations would be questioned and negotiated, in regards to the classroom ecology conservation.},
}The quality of the relationship between teacher and learners is a key factor in improving learning. If this relationship remains observable by several classical methods (self and hetero-reported), the recent introduction of computer vision in classrooms is likely to considerably increase the investigation of its interactional component, the most obvious and immediate dimension of relationships between learners and teachers (Pianta, 1999). However, the implementation of cameras feeding artificial intelligence processes in classrooms divides the scientific community. Between contemptible worried about data surveillance, and laudatory about the adaptive perspectives of a teacher informed in real time of the state, even hidden, of learners, it seems to us possible to draw a line of demarcation, on which the impact of such so-called computational instrumentations would be questioned and negotiated, in regards to the classroom ecology conservation. 
- Journal BOARD-AI: A goal-aware modeling interface for systems engineering, combining machine learning and plan recognition
 S. Castellanos-Paez, N. Hili, A. Albore and M. Pérez-Sanagustin
 Frontiers in Physics, vol. 10, pp. 10:944086, October 2022
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{castellanospaez:hal-03882810,
  title = {{BOARD-AI: A goal-aware modeling interface for systems engineering, combining machine learning and plan recognition}},
  author = {Castellanos-Paez, Sandra and Hili, Nicolas and Albore, Alexandre and P{\'e}rez-Sanagustin, Mar},
  journal = {{Frontiers in Physics}},
  hal_version = {v1},
  hal_id = {hal-03882810},
  pdf = {https://hal.science/hal-03882810v1/file/DTIS22226.pdf},
  keywords = {machine learning ; plan recognition ; artificial intelligence ; systems engineering ; sketch recognition ; model-driven development},
  doi = {10.3389/fphy.2022.944086},
  month = {October},
  year = {2022},
  pages = {10:944086},
  volume = {10},
  publisher = {{Frontiers}},
  note = {Ed. by William Frere Lawless},
  url = {https://hal.science/hal-03882810},
  abstract = {Paper and pens remain the most commonly used tools by systems engineers to capture system models. They improve productivity and foster collaboration and creativity as the users do not need to conform to formal notations commonly present in Computer-Aided Systems Engineering (CASE) tools for system modeling. However, digitizing models sketched on a whiteboard into CASE tools remains a difficult and error-prone activity that requires the knowledge of tool experts. Over the past decade, switching from symbolic reasoning to machine learning has been the natural choice in many domains to improve the performance of software applications. The field of natural sketching and online recognition is no exception to the rule and most of the existing sketch recognizers rely on pre-trained sets of symbols to increase the confidence in the outcome of the recognizers. However, that performance improvement comes at the cost of trust. The lack of trust directly stems from the lack of explainability of the outcomes of the neural networks, which hinders its acceptance by systems engineering teams. A solution shall not only combine the performance and robustness but shall also earn unreserved support and trust from human users. While most of the works in the literature tip the scale in favor of performance, there is a need to better include studies on human perception into the equation to restore balance. This study presents an approach and a Human-machine interface for natural sketching that allows engineers to capture system models using interactive whiteboards. The approach combines techniques from symbolic AI and machine learning to improve performance while not compromising explainability. The key concept of the approach is to use a trained neural network to separate, upstream from the global recognition process, handwritten text from geometrical symbols, and to use the suitable technique (OCR or automated planning) to recognize text and symbols individually. Key advantages of the approach are that it does not resort to any other interaction modalities (e.g., virtual keyboards) to annotate model elements with textual properties and that the explainability of the outcomes of the modeling assistant is preserved. A user experiment validates the usability of the interface.},
}Paper and pens remain the most commonly used tools by systems engineers to capture system models. They improve productivity and foster collaboration and creativity as the users do not need to conform to formal notations commonly present in Computer-Aided Systems Engineering (CASE) tools for system modeling. However, digitizing models sketched on a whiteboard into CASE tools remains a difficult and error-prone activity that requires the knowledge of tool experts. Over the past decade, switching from symbolic reasoning to machine learning has been the natural choice in many domains to improve the performance of software applications. The field of natural sketching and online recognition is no exception to the rule and most of the existing sketch recognizers rely on pre-trained sets of symbols to increase the confidence in the outcome of the recognizers. However, that performance improvement comes at the cost of trust. The lack of trust directly stems from the lack of explainability of the outcomes of the neural networks, which hinders its acceptance by systems engineering teams. A solution shall not only combine the performance and robustness but shall also earn unreserved support and trust from human users. While most of the works in the literature tip the scale in favor of performance, there is a need to better include studies on human perception into the equation to restore balance. This study presents an approach and a Human-machine interface for natural sketching that allows engineers to capture system models using interactive whiteboards. The approach combines techniques from symbolic AI and machine learning to improve performance while not compromising explainability. The key concept of the approach is to use a trained neural network to separate, upstream from the global recognition process, handwritten text from geometrical symbols, and to use the suitable technique (OCR or automated planning) to recognize text and symbols individually. Key advantages of the approach are that it does not resort to any other interaction modalities (e.g., virtual keyboards) to annotate model elements with textual properties and that the explainability of the outcomes of the modeling assistant is preserved. A user experiment validates the usability of the interface. 
- Preprint Lightweight Transformers for Human Activity Recognition on Mobile Devices
 S. Ek, F. Portet and P. Lalanda
 2022
  DOI DOI arXiv[BibTeX] arXiv[BibTeX]@unpublished{https://doi.org/10.48550/arxiv.2209.11750,
  title = {Lightweight Transformers for Human Activity Recognition on Mobile Devices},
  author = {Ek, Sannara and Portet, François and Lalanda, Philippe},
  copyright = {arXiv.org perpetual, non-exclusive license},
  year = {2022},
  publisher = {, arXiv},
  keywords = {Computer Vision and Pattern Recognition (cs.CV), Artificial Intelligence (cs.AI), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
  url = {https://arxiv.org/abs/2209.11750},
  doi = {10.48550/ARXIV.2209.11750},
}
- Preprint Video Lecture Design and Student Engagement: Analysis of Visual Attention, Affect, Satisfaction, and Learning Outcomes
 L. Lassance, L. V. L. Filgueiras, P. Dessus, T. Guntz and J. L. Crowley
 2022
  DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@unpublished{lassance:hal-03775919,
  title = {{Video Lecture Design and Student Engagement: Analysis of Visual Attention, Affect, Satisfaction, and Learning Outcomes}},
  author = {Lassance, Laura and Filgueiras, Lucia Vilela Leite and Dessus, Philippe and Guntz, Thomas and Crowley, James L.},
  hal_version = {v1},
  hal_id = {hal-03775919},
  keywords = {affective learning ; Cognitive load ; e-learning ; Engagement ; Instructor presence ; Multimedia ; Multimodal analysis ; Video lecture design},
  doi = {10.31234/osf.io/qkynw},
  year = {2022},
  note = {working paper or preprint},
  url = {https://hal.univ-grenoble-alpes.fr/hal-03775919},
  abstract = {The growing availability of online multimedia instructions, such as Massive Open Online Courses (MOOCs) mark a revolutionary new phase in the use of technology for education. Considering the high student attrition in MOOCs, it is crucial to study how students engage and disengage during their learning experience in relation to the video lecture design. The present study conducted a pilot user experiment (n = 24) to evaluate in a multimodal way which video lecture design is more effective for learning. Two video lecture designs were scrutinized: voice over slides, and slides overlaid by picture-in-picture instructor video. The experimental setup included different tracking technologies and sensorial modalities to gather synchronized data from the learning experience: eye-tracker, Kinect, frontal camera, and screen recording. Among the measures, eye-gaze observational data, facial expressions, and self-reported perceptions were analyzed and compared against the learning assessment results. Based on these results, engagement is discussed regarding the different video lecture designs by connecting the observational and self-reported data to the short-term learning outcomes.},
}The growing availability of online multimedia instructions, such as Massive Open Online Courses (MOOCs) mark a revolutionary new phase in the use of technology for education. Considering the high student attrition in MOOCs, it is crucial to study how students engage and disengage during their learning experience in relation to the video lecture design. The present study conducted a pilot user experiment (n = 24) to evaluate in a multimodal way which video lecture design is more effective for learning. Two video lecture designs were scrutinized: voice over slides, and slides overlaid by picture-in-picture instructor video. The experimental setup included different tracking technologies and sensorial modalities to gather synchronized data from the learning experience: eye-tracker, Kinect, frontal camera, and screen recording. Among the measures, eye-gaze observational data, facial expressions, and self-reported perceptions were analyzed and compared against the learning assessment results. Based on these results, engagement is discussed regarding the different video lecture designs by connecting the observational and self-reported data to the short-term learning outcomes. 
- Journal Analyser automatiquement les signaux de l’enseignement : Une approche d’apprentissage social fondée sur les preuves
 R. Laurent, P. Dessus and D. Vaufreydaz
 A.N.A.E. Approche neuropsychologique des apprentissages chez l’enfant, pp. 29-36, 2022
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{laurent:hal-03599280,
  title = {{Analyser automatiquement les signaux de l'enseignement : Une approche d'apprentissage social fond{\'e}e sur les preuves}},
  author = {Laurent, Romain and Dessus, Philippe and Vaufreydaz, Dominique},
  journal = {{A.N.A.E. Approche neuropsychologique des apprentissages chez l'enfant}},
  hal_version = {v2},
  hal_id = {hal-03599280},
  pdf = {https://hal.univ-grenoble-alpes.fr/hal-03599280v2/file/ANAE-HAL.pdf},
  keywords = {Social Learning ; Machine Learning ; Signal Processing and Analysis ; Pedagogy ; Evidence-based Education ; Apprentissage machine ; Traitement et analyse du signal ; P{\'e}dagogie ; {\'E}ducation fond{\'e}e sur les preuves ; Apprentissage social},
  year = {2022},
  pages = {29-36},
  number = {176},
  publisher = {{St{\'e} Artemis [1989] - J. Libbey Eurotext [1990-1993] - PDG Communication [1994-2002] - Pleiomedia [2003-....]}},
  url = {https://hal.univ-grenoble-alpes.fr/hal-03599280},
  abstract = {Recent advances in signal processing and analysis have made it possible to create new ways of instrumenting the observation and the analysis of educational events, and thus to gather new kinds of evidence on teaching and learning practice. This article identifies some of these, based on a “social learning” framework, which posits that pedagogy is a social activity embedded in everyday life, and relies on certain innate human capacities.},
}Recent advances in signal processing and analysis have made it possible to create new ways of instrumenting the observation and the analysis of educational events, and thus to gather new kinds of evidence on teaching and learning practice. This article identifies some of these, based on a “social learning” framework, which posits that pedagogy is a social activity embedded in everyday life, and relies on certain innate human capacities. 
- Journal Les leçons d’AltSchool : Comment compromettre les relations enseignant-élèves sous prétexte d’innover ?
 P. Dessus
 Vivre le primaire, vol. 34, no. 4, pp. 84-85, December 2021
  PDF PDF HAL[BibTeX] HAL[BibTeX]@article{dessus:hal-03494883,
  title = {{Les le{\c c}ons d'AltSchool : Comment compromettre les relations enseignant-{\'e}l{\`e}ves sous pr{\'e}texte d'innover ?}},
  author = {Dessus, Philippe},
  journal = {{Vivre le primaire}},
  hal_version = {v1},
  hal_id = {hal-03494883},
  pdf = {https://hal.univ-grenoble-alpes.fr/hal-03494883v1/file/art-v-finale.pdf},
  keywords = {Vie priv{\'e}e ; {\'E}thique ; Relations enseignant-{\'e}l{\`e}ves ; Classes sensibles au contexte},
  month = {December},
  year = {2021},
  pages = {84-85},
  number = {4},
  volume = {34},
  publisher = {{Association Qu{\'e}b{\'e}coise des Enseignants du Primaire (AQEP)}},
  url = {https://hal.univ-grenoble-alpes.fr/hal-03494883},
  abstract = {},
}
- Journal Sciences sociales et apprentissage machine pour l’interaction
 D. Vaufreydaz
 Interstices, September 2021
  HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{vaufreydaz:hal-03363875,
  title = {{Sciences sociales et apprentissage machine pour l'interaction}},
  author = {Vaufreydaz, Dominique},
  journal = {{Interstices}},
  hal_version = {v1},
  hal_id = {hal-03363875},
  keywords = {Apprentissage machine deep learning ; Sciences humaines \& sociales ; Interactions Homme-Machine ; Robot},
  month = {September},
  year = {2021},
  publisher = {{INRIA}},
  url = {https://inria.hal.science/hal-03363875},
  abstract = {Le machine learning a aujourd'hui fait preuve de son efficacité : on peut produire, à partir d'une grande masse d'informations, des Intelligences Artificielles capables de répondre à de nombreux besoins, comme le montrent les progrès en vision par ordinateur ou en traduction automatique ces dernières années. Pour autant, cette technique a des limites, vis-à-vis des secteurs ne disposant pas de suffisamment de données, vis-à-vis de certaines questions éthiques, et vis-à-vis de son explicabilité. Pour pallier ces problèmes dans les applications où le Machine Learning seul n’est pas efficient, les sciences humaines peuvent apporter des solutions et de la précision aux systèmes automatiques. À l'aide de deux exemples concrets, Dominique Vaufreydaz illustre comment les apports des sciences humaines peuvent nourrir et améliorer un programme informatique dédié aux interactions avec les humains.},
}Le machine learning a aujourd'hui fait preuve de son efficacité : on peut produire, à partir d'une grande masse d'informations, des Intelligences Artificielles capables de répondre à de nombreux besoins, comme le montrent les progrès en vision par ordinateur ou en traduction automatique ces dernières années. Pour autant, cette technique a des limites, vis-à-vis des secteurs ne disposant pas de suffisamment de données, vis-à-vis de certaines questions éthiques, et vis-à-vis de son explicabilité. Pour pallier ces problèmes dans les applications où le Machine Learning seul n’est pas efficient, les sciences humaines peuvent apporter des solutions et de la précision aux systèmes automatiques. À l'aide de deux exemples concrets, Dominique Vaufreydaz illustre comment les apports des sciences humaines peuvent nourrir et améliorer un programme informatique dédié aux interactions avec les humains. 
- Proceedings Comment instrumenter l’observation et l’analyse de la REE ?
 P. Dessus
 Les systèmes éducatifs québécois et français sous l’angle de la relation enseignant-apprenants : enjeux et impacts, Montréal, Canada, September 2021
  HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{dessus:hal-03359291,
  title = {{Comment instrumenter l'observation et l'analyse de la REE ?}},
  author = {Dessus, Philippe},
  booktitle = {{Les syst{\`e}mes {\'e}ducatifs qu{\'e}b{\'e}cois et fran{\c c}ais sous l'angle de la relation enseignant-apprenants : enjeux et impacts}},
  hal_version = {v1},
  hal_id = {hal-03359291},
  keywords = {Relations enseignant-{\'e}l{\`e}ve ; Syst{\`e}mes d'observation ; Salles ambiantes},
  month = {September},
  year = {2021},
  organization = {{S{\'e}verine Ha{\"i}at and Annie Charron}},
  address = {Montr{\'e}al, Canada},
  url = {https://hal.science/hal-03359291},
  abstract = {Cette intervention passe en revue les outils qui peuvent aider à l'observation et l'analyse de la relation enseignant-apprenants (REE). Nous verrons comment informatiser la saisie d'observations d'événements scolaires, puis comment l'enregistrement vidéo peut apporter des éléments complémentaires, à la fois du point de vue de l'observation que du développement professionnel des enseignants. Des dispositifs plus élaborés, comme l'oculométrie et les salles sensibles au contexte seront ensuite détaillés. En termes d'analyse, nous définirons l'analyse sémantique des codages d'événements, ainsi que l'analyse en réseaux sociaux, pour terminer avec de plus récentes avancées en apprentissage machine. Une réflexion sur l'éthique et la vie privée, essentielle vu le contexte, sera également menée.},
}Cette intervention passe en revue les outils qui peuvent aider à l'observation et l'analyse de la relation enseignant-apprenants (REE). Nous verrons comment informatiser la saisie d'observations d'événements scolaires, puis comment l'enregistrement vidéo peut apporter des éléments complémentaires, à la fois du point de vue de l'observation que du développement professionnel des enseignants. Des dispositifs plus élaborés, comme l'oculométrie et les salles sensibles au contexte seront ensuite détaillés. En termes d'analyse, nous définirons l'analyse sémantique des codages d'événements, ainsi que l'analyse en réseaux sociaux, pour terminer avec de plus récentes avancées en apprentissage machine. Une réflexion sur l'éthique et la vie privée, essentielle vu le contexte, sera également menée. 
- Proceedings Navigation In Urban Environments Amongst Pedestrians Using Multi-Objective Deep Reinforcement Learning
 N. Deshpande, D. Vaufreydaz and A. Spalanzani
 ITSC 2021 – 24th IEEE International Conference on Intelligent Transportation Systems, pp. 1-7, Indianapolis, United States, September 2021
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{deshpande:hal-03372856,
  title = {{Navigation In Urban Environments Amongst Pedestrians Using Multi-Objective Deep Reinforcement Learning}},
  author = {Deshpande, Niranjan and Vaufreydaz, Dominique and Spalanzani, Anne},
  booktitle = {{ITSC 2021 - 24th IEEE International Conference on Intelligent Transportation Systems}},
  hal_version = {v1},
  hal_id = {hal-03372856},
  pdf = {https://inria.hal.science/hal-03372856v1/file/ITSC2021.pdf},
  doi = {10.1109/ITSC48978.2021.9564601},
  month = {September},
  year = {2021},
  pages = {1-7},
  address = {Indianapolis, United States},
  url = {https://inria.hal.science/hal-03372856},
  abstract = {Urban autonomous driving in the presence of pedestrians as vulnerable road users is still a challenging and less examined research problem. This work formulates navigation in urban environments as a multi objective reinforcement learning problem. A deep learning variant of thresholded lexicographic Q-learning is presented for autonomous navigation amongst pedestrians. The multi objective DQN agent is trained on a custom urban environment developed in CARLA simulator. The proposed method is evaluated by comparing it with a single objective DQN variant on known and unknown environments. Evaluation results show that the proposed method outperforms the single objective DQN variant with respect to all aspects.},
}Urban autonomous driving in the presence of pedestrians as vulnerable road users is still a challenging and less examined research problem. This work formulates navigation in urban environments as a multi objective reinforcement learning problem. A deep learning variant of thresholded lexicographic Q-learning is presented for autonomous navigation amongst pedestrians. The multi objective DQN agent is trained on a custom urban environment developed in CARLA simulator. The proposed method is evaluated by comparing it with a single objective DQN variant on known and unknown environments. Evaluation results show that the proposed method outperforms the single objective DQN variant with respect to all aspects. 
- Proceedings Robust collaborative collision avoidance between robots with nearly symmetric crossing trajectories
 G. Silva, K. Rekik and J. L. Crowley
 ECMR 2021 – European Conference on Mobile Robots, Bonn, Germany, August 2021
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{silva:hal-03574780,
  title = {{Robust collaborative collision avoidance between robots with nearly symmetric crossing trajectories}},
  author = {Silva, Grimaldo and Rekik, Khansa and Crowley, James L.},
  booktitle = {{ECMR 2021 - European Conference on Mobile Robots}},
  hal_version = {v1},
  hal_id = {hal-03574780},
  pdf = {https://hal.science/hal-03574780v1/file/Silva-ECMR2021.pdf},
  keywords = {Mobile robot navigation ; mobile robot control ; Collaborative AI},
  month = {August},
  year = {2021},
  address = {Bonn, Germany},
  url = {https://hal.science/hal-03574780},
  abstract = {The growth in both acceptance and usage of mobile robots have given risen to novel challenges in robot navigation. Often, robots that share a space but are unable to communicate are required to safely avoid each other even under sensor noise. Current approaches have often relied on the assumption that collaboration is always done correctly, in practice, sensor noise might lead robots to make avoidance motions that are not mutually beneficial and do not actually decrease the collision risk. Our approach intends to mitigate the negative impact of sensor noise in collaborative collision avoidance of robots. As a consequence, even if robots initially take non-mutually beneficial avoidance motions they would correctly perceive their role in the next decision step.},
}The growth in both acceptance and usage of mobile robots have given risen to novel challenges in robot navigation. Often, robots that share a space but are unable to communicate are required to safely avoid each other even under sensor noise. Current approaches have often relied on the assumption that collaboration is always done correctly, in practice, sensor noise might lead robots to make avoidance motions that are not mutually beneficial and do not actually decrease the collision risk. Our approach intends to mitigate the negative impact of sensor noise in collaborative collision avoidance of robots. As a consequence, even if robots initially take non-mutually beneficial avoidance motions they would correctly perceive their role in the next decision step. 
- Proceedings A distillation-based approach integrating continual learning and federated learning for pervasive services
 A. Usmanova, F. Portet, P. Lalanda and G. Vega
 3rd Workshop on Continual and Multimodal Learning for Internet of Things —  Co-located with IJCAI 2021, Montreal, Canada, August 2021
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{usmanova:hal-03332046,
  title = {{A distillation-based approach integrating continual learning and federated learning for pervasive services}},
  author = {Usmanova, Anastasiia and Portet, Fran{\c c}ois and Lalanda, Philippe and Vega, German},
  booktitle = {{3rd Workshop on Continual and Multimodal Learning for Internet of Things --  Co-located with IJCAI 2021}},
  hal_version = {v1},
  hal_id = {hal-03332046},
  pdf = {https://hal.science/hal-03332046v1/file/ijcai21-multiauthor.pdf},
  month = {August},
  year = {2021},
  address = {Montreal, Canada},
  url = {https://hal.science/hal-03332046},
  abstract = {Federated Learning, a new machine learning paradigm enhancing the use of edge devices, is receiving a lot of attention in the pervasive community to support the development of smart services. Nevertheless, this approach still needs to be adapted to the specificity of the pervasive domain. In particular, issues related to continual learning need to be addressed. In this paper, we present a distillation-based approach dealing with catastrophic forgetting in federated learning scenario. Specifically, Human Activity Recognition tasks are used as a demonstration domain.},
}Federated Learning, a new machine learning paradigm enhancing the use of edge devices, is receiving a lot of attention in the pervasive community to support the development of smart services. Nevertheless, this approach still needs to be adapted to the specificity of the pervasive domain. In particular, issues related to continual learning need to be addressed. In this paper, we present a distillation-based approach dealing with catastrophic forgetting in federated learning scenario. Specifically, Human Activity Recognition tasks are used as a demonstration domain. 
- Proceedings Automated Planning to~Evolve Smart Grids with~Renewable Energies
 S. Castellanos-Paez, M. Alvarez-Herault and P. Lalanda
 IFIP Advances in Information and Communication Technology, vol. AICT-637, pp. 141-155, Montreal, QC, Canada, August 2021
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{castellanospaez:hal-04120827,
  title = {{Automated Planning to~Evolve Smart Grids with~Renewable Energies}},
  author = {Castellanos-Paez, Sandra and Alvarez-Herault, Marie-Cecile and Lalanda, Philippe},
  booktitle = {{IFIP Advances in Information and Communication Technology}},
  hal_version = {v2},
  hal_id = {hal-04120827},
  pdf = {https://inria.hal.science/hal-04120827v2/file/528187_1_En_11_Chapter.pdf},
  keywords = {Smart grids ; Distribution network ; Distributed generation ; Automated planning},
  doi = {10.1007/978-3-030-96592-1\_11},
  month = {August},
  year = {2021},
  pages = {141-155},
  volume = {AICT-637},
  series = {Artificial Intelligence for Knowledge Management, Energy, and Sustainability},
  publisher = {{Springer International Publishing}},
  editor = {Eunika Mercier-Laurent and G{\"u}lg{\"u}n Kayakutlu},
  address = {Montreal, QC, Canada},
  url = {https://inria.hal.science/hal-04120827},
  abstract = {Smart electrical grids play a major role in energy transition but raise important software problems. Some of them can be efficiently solved by AI techniques. In particular, the increasing use of distributed generation based on renewable energies (wind, photovoltaic, among others) leads to the issue of its integration into the distribution network. The distribution network was not originally designed to accommodate generation units but to carry electricity from the distribution network to medium and low voltage consumers. Some methods have been used to automatically build target architectures to be reached within a given time horizon (of several decades) capable of accommodating a massive insertion of distributed generation while guaranteeing some technical constraints. However, these target networks may be quite different from the existing ones and therefore a direct mutation of the network would be too costly. It is therefore necessary to define the succession of works year after year to reach the target. We addressed it by translating it to an Automated Planning problem. We defined a transformation of the distribution network knowledge into a PDDL representation. The modelled domain representation was fed to a planner to obtain the set of lines to be built and deconstructed until the target is reached. Experimental analysis, on several networks at different scales, demonstrated the applicability of the approach and the reduction in reliance on expert knowledge. The objective of further work is to mutate an initial network towards a target network while minimizing the total cost and respecting technical constraints.},
}Smart electrical grids play a major role in energy transition but raise important software problems. Some of them can be efficiently solved by AI techniques. In particular, the increasing use of distributed generation based on renewable energies (wind, photovoltaic, among others) leads to the issue of its integration into the distribution network. The distribution network was not originally designed to accommodate generation units but to carry electricity from the distribution network to medium and low voltage consumers. Some methods have been used to automatically build target architectures to be reached within a given time horizon (of several decades) capable of accommodating a massive insertion of distributed generation while guaranteeing some technical constraints. However, these target networks may be quite different from the existing ones and therefore a direct mutation of the network would be too costly. It is therefore necessary to define the succession of works year after year to reach the target. We addressed it by translating it to an Automated Planning problem. We defined a transformation of the distribution network knowledge into a PDDL representation. The modelled domain representation was fed to a planner to obtain the set of lines to be built and deconstructed until the target is reached. Experimental analysis, on several networks at different scales, demonstrated the applicability of the approach and the reduction in reliance on expert knowledge. The objective of further work is to mutate an initial network towards a target network while minimizing the total cost and respecting technical constraints. 
- Preprint Apprendre en toute éthique dans les salles de classe intelligentes
 R. Laurent, P. Dessus and D. Vaufreydaz
 May 2021
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@misc{laurent:hal-03239879,
  title = {{Apprendre en toute {\'e}thique dans les salles de classe intelligentes}},
  author = {Laurent, Romain and Dessus, Philippe and Vaufreydaz, Dominique},
  hal_version = {v1},
  hal_id = {hal-03239879},
  pdf = {https://hal.science/hal-03239879v1/file/QDLR_2SC_V3.3.pdf},
  keywords = {Salle de classe intelligente ; Typologie},
  month = {May},
  year = {2021},
  publisher = {{R{\'e}seau Canop{\'e}}},
  editor = {Canop{\'e}},
  url = {https://hal.science/hal-03239879},
  abstract = {Après le numérique sous des formes variées mais désormais familières aux enseignants et à leurs élèves, c'est aujourd'hui l'intelligence artificielle qui s'invite dans les salles de classe. La vision informatique, notamment, offre des opportunités inédites de captation et d'analyse de ce qui se passe dans les classes, dans une perspective d'amplification de la cognition humaine. Les rétroactions formatives à l'enseignant pourraient s'en trouver considérablement enrichies, particulièrement pour saisir l'impact de ses pratiques sur les apprenants, mais cette introduction de la « machine qui pense », souvent présentée dans la littérature scientifique comme une panacée, ne saurait nous exonérer de penser avant elle tous les tenants et aboutissants d'une telle implantation. Comme pour toutes les sphères de l'activité humaine où elle est désormais invitée (voire convoquée), les moyens et buts de l'intelligence artificielle à l'école doivent être interrogés.},
}Après le numérique sous des formes variées mais désormais familières aux enseignants et à leurs élèves, c'est aujourd'hui l'intelligence artificielle qui s'invite dans les salles de classe. La vision informatique, notamment, offre des opportunités inédites de captation et d'analyse de ce qui se passe dans les classes, dans une perspective d'amplification de la cognition humaine. Les rétroactions formatives à l'enseignant pourraient s'en trouver considérablement enrichies, particulièrement pour saisir l'impact de ses pratiques sur les apprenants, mais cette introduction de la « machine qui pense », souvent présentée dans la littérature scientifique comme une panacée, ne saurait nous exonérer de penser avant elle tous les tenants et aboutissants d'une telle implantation. Comme pour toutes les sphères de l'activité humaine où elle est désormais invitée (voire convoquée), les moyens et buts de l'intelligence artificielle à l'école doivent être interrogés. 
- Proceedings A Federated Learning Aggregation Algorithm for Pervasive Computing: Evaluation and Comparison
 S. Ek, F. Portet, P. Lalanda and G. Vega
 19th IEEE International Conference on Pervasive Computing and Communications PerCom 2021, Kassel (virtual), Germany, March 2021
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{ek:hal-03207411,
  title = {{A Federated Learning Aggregation Algorithm for Pervasive Computing: Evaluation and Comparison}},
  author = {Ek, Sannara and Portet, Fran{\c c}ois and Lalanda, Philippe and Vega, German},
  booktitle = {{19th IEEE International Conference on Pervasive Computing and Communications PerCom 2021}},
  hal_version = {v1},
  hal_id = {hal-03207411},
  pdf = {https://hal.science/hal-03207411v1/file/percom_2021_Federated_learning.pdf},
  keywords = {Federated Learning ; algorithm ; evaluation ; Human Activity Recognition},
  month = {March},
  year = {2021},
  address = {Kassel (virtual), Germany},
  url = {https://hal.science/hal-03207411},
  abstract = {Pervasive computing promotes the installation of connected devices in our living spaces in order to provide services. Two major developments have gained significant momentum recently: an advanced use of edge resources and the integration of machine learning techniques for engineering applications. This evolution raises major challenges, in particular related to the appropriate distribution of computing elements along an edgeto-cloud continuum. About this, Federated Learning has been recently proposed for distributed model training in the edge. The principle of this approach is to aggregate models learned on distributed clients in order to obtain a new, more general model. The resulting model is then redistributed to clients for further training. To date, the most popular federated learning algorithm uses coordinate-wise averaging of the model parameters for aggregation. However, it has been shown that this method is not adapted in heterogeneous environments where data is not identically and independently distributed (non-iid). This corresponds directly to some pervasive computing scenarios where heterogeneity of devices and users challenges machine learning with the double objective of generalization and personalization. In this paper, we propose a novel aggregation algorithm, termed FedDist, which is able to modify its model architecture (here, deep neural network) by identifying dissimilarities between specific neurons amongst the clients. This permits to account for clients' specificity without impairing generalization. Furthermore, we define a complete method to evaluate federated learning in a realistic way taking generalization and personalization into account. Using this method, FedDist is extensively tested and compared with three state-of-the-art federated learning algorithms on the pervasive domain of Human Activity Recognition with smartphones.},
}Pervasive computing promotes the installation of connected devices in our living spaces in order to provide services. Two major developments have gained significant momentum recently: an advanced use of edge resources and the integration of machine learning techniques for engineering applications. This evolution raises major challenges, in particular related to the appropriate distribution of computing elements along an edgeto-cloud continuum. About this, Federated Learning has been recently proposed for distributed model training in the edge. The principle of this approach is to aggregate models learned on distributed clients in order to obtain a new, more general model. The resulting model is then redistributed to clients for further training. To date, the most popular federated learning algorithm uses coordinate-wise averaging of the model parameters for aggregation. However, it has been shown that this method is not adapted in heterogeneous environments where data is not identically and independently distributed (non-iid). This corresponds directly to some pervasive computing scenarios where heterogeneity of devices and users challenges machine learning with the double objective of generalization and personalization. In this paper, we propose a novel aggregation algorithm, termed FedDist, which is able to modify its model architecture (here, deep neural network) by identifying dissimilarities between specific neurons amongst the clients. This permits to account for clients' specificity without impairing generalization. Furthermore, we define a complete method to evaluate federated learning in a realistic way taking generalization and personalization into account. Using this method, FedDist is extensively tested and compared with three state-of-the-art federated learning algorithms on the pervasive domain of Human Activity Recognition with smartphones. 
- Proceedings Color-based Fusion of MRI Modalities for Brain Tumor Segmentation
 N. Aboubakr, M. Popova and J. L. Crowley
 Lecture Notes in Electrical Engineering, vol. 784, pp. 89-97, Birmingham, United Kingdom, March 2021
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{aboubakr:hal-03174069,
  title = {{Color-based Fusion of MRI Modalities for Brain Tumor Segmentation}},
  author = {Aboubakr, Nachwa and Popova, Mihaela and Crowley, James L.},
  booktitle = {{Lecture Notes in Electrical Engineering}},
  hal_version = {v1},
  hal_id = {hal-03174069},
  pdf = {https://hal.science/hal-03174069v1/file/Color_based_Fusion_of_MRI_Modalities_for_Brain_Tumor_Segmentation.pdf},
  keywords = {Tumor Segmentation ; MRI ; Modality Fusion ; Medical Imaging},
  doi = {10.1007/978-981-16-3880-0\_10},
  month = {March},
  year = {2021},
  pages = {89-97},
  volume = {784},
  series = {Lecture Notes in Electrical Engineering},
  publisher = {{Springer}},
  address = {Birmingham, United Kingdom},
  url = {https://hal.science/hal-03174069},
  abstract = {Most attempts to provide automatic techniques to detect and locate suspected tumors in Magnetic Resonance images (MRI) concentrate on a single MRI modality. Radiologists typically use multiple MRI modalities for such tasks. In this paper, we report on experiments for automatic detection and segmentation of tumors in which multiple MRI modalities are encoded using classical color encodings. We investigate the use of 2D convolutional networks using a classic U-Net architecture. Slice-by-slice MRI analysis for tumor detection is challenging because this task requires contextual information from 3D tissue structures. However, 3D convolutional networks are prohibitively expensive to train. To overcome this challenge, we extract a set of 2D images by projecting the 3D volume of MRI with maximum contrast. Multiple MRI modalities are then combined as independent colors to provide a color-encoded 2D image. We show experimentally that this led to better performance than slice-by-slice training while limiting the number of trainable parameters and the requirement for training data to a reasonable limit.},
}Most attempts to provide automatic techniques to detect and locate suspected tumors in Magnetic Resonance images (MRI) concentrate on a single MRI modality. Radiologists typically use multiple MRI modalities for such tasks. In this paper, we report on experiments for automatic detection and segmentation of tumors in which multiple MRI modalities are encoded using classical color encodings. We investigate the use of 2D convolutional networks using a classic U-Net architecture. Slice-by-slice MRI analysis for tumor detection is challenging because this task requires contextual information from 3D tissue structures. However, 3D convolutional networks are prohibitively expensive to train. To overcome this challenge, we extract a set of 2D images by projecting the 3D volume of MRI with maximum contrast. Multiple MRI modalities are then combined as independent colors to provide a color-encoded 2D image. We show experimentally that this led to better performance than slice-by-slice training while limiting the number of trainable parameters and the requirement for training data to a reasonable limit. 
- Proceedings On the Relevance of Extracting Macro-operators with Non-adjacent Actions: Does It Matter?
 S. Castellanos-Paez, R. Rombourg and P. Lalanda
 13th International Conference on Agents and Artificial Intelligence, Vienne, Austria, February 2021
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{castellanospaez:hal-03131351,
  title = {{On the Relevance of Extracting Macro-operators with Non-adjacent Actions: Does It Matter?}},
  author = {Castellanos-Paez, Sandra and Rombourg, Romain and Lalanda, Philippe},
  booktitle = {{13th International Conference on Agents and Artificial Intelligence}},
  hal_version = {v1},
  hal_id = {hal-03131351},
  pdf = {https://hal.univ-grenoble-alpes.fr/hal-03131351v1/file/ICAART_2021_187_CR.pdf},
  keywords = {Automated Planning ; Macro-operators ; Learning ; Data Mining},
  month = {February},
  year = {2021},
  address = {Vienne, Austria},
  url = {https://hal.univ-grenoble-alpes.fr/hal-03131351},
  abstract = {Understanding the role that plays the extraction phase on identifying potential macro candidates to augment a domain is critical. In this paper, we present a method to analyse the link between extracting macro-operators from non-adjacent actions and the correctness of (1) the frequency and (2) the number of occurrences per plan. We carried out experiments using our method on five benchmark domains and three different planners. We found that extracting macro-operators with only adjacent actions leads to important errors in macro-operator frequency and occurrences per plan.},
}Understanding the role that plays the extraction phase on identifying potential macro candidates to augment a domain is critical. In this paper, we present a method to analyse the link between extracting macro-operators from non-adjacent actions and the correctness of (1) the frequency and (2) the number of occurrences per plan. We carried out experiments using our method on five benchmark domains and three different planners. We found that extracting macro-operators with only adjacent actions leads to important errors in macro-operator frequency and occurrences per plan. 
- Proceedings ERA: Extracting planning macro-operators from adjacent and non-adjacent sequences
 S. Castellanos-Paez, R. Rombourg and P. Lalanda
 2020 Principle and Practice of Data and Knowledge Acquisition Workshop, Yokohama, Japan, January 2021
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{castellanospaez:hal-03131334,
  title = {{ERA: Extracting planning macro-operators from adjacent and non-adjacent sequences}},
  author = {Castellanos-Paez, Sandra and Rombourg, Romain and Lalanda, Philippe},
  booktitle = {{2020 Principle and Practice of Data and Knowledge Acquisition Workshop}},
  hal_version = {v1},
  hal_id = {hal-03131334},
  pdf = {https://hal.univ-grenoble-alpes.fr/hal-03131334v1/file/PKAW_ERA_20.pdf},
  keywords = {Automated Planning ; Macro-operators ; Learning ; Data Mining},
  month = {January},
  year = {2021},
  address = {Yokohama, Japan},
  url = {https://hal.univ-grenoble-alpes.fr/hal-03131334},
  abstract = {Intuitively, Automated Planning systems capable of learning from previous experiences should be able to achieve better performance. One way to build on past experiences is to augment domains with macro-operators (i.e. frequent operator sequences). In most existing works, macros are generated from chunks of adjacent operators extracted from a set of plans. Although they provide some interesting results this type of analysis may provide incomplete results. In this paper, we propose ERA, an automatic extraction method for macro-operators from a set of solution plans. Our algorithm is domain and planner independent and can find all macro-operator occurrences even if the operators are non-adjacent. Our method has proven to successfully find macrooperators of dierent lengths for six different benchmark domains. Also, our experiments highlighted the capital role of considering non-adjacent occurrences in the extraction of macro-operators.},
}Intuitively, Automated Planning systems capable of learning from previous experiences should be able to achieve better performance. One way to build on past experiences is to augment domains with macro-operators (i.e. frequent operator sequences). In most existing works, macros are generated from chunks of adjacent operators extracted from a set of plans. Although they provide some interesting results this type of analysis may provide incomplete results. In this paper, we propose ERA, an automatic extraction method for macro-operators from a set of solution plans. Our algorithm is domain and planner independent and can find all macro-operator occurrences even if the operators are non-adjacent. Our method has proven to successfully find macrooperators of dierent lengths for six different benchmark domains. Also, our experiments highlighted the capital role of considering non-adjacent occurrences in the extraction of macro-operators. 
- Proceedings Evaluating Federated Learning for human activity recognition
 S. Ek, F. Portet, P. Lalanda and G. E. Vega Baez
 Workshop AI for Internet of Things, in conjunction with IJCAI-PRICAI 2020, Yokohama, Japan, January 2021
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{ek:hal-03102880,
  title = {{Evaluating Federated Learning for human activity recognition}},
  author = {Ek, Sannara and Portet, Fran{\c c}ois and Lalanda, Philippe and Vega Baez, German Eduardo},
  booktitle = {{Workshop AI for Internet of Things, in conjunction with IJCAI-PRICAI 2020}},
  hal_version = {v1},
  hal_id = {hal-03102880},
  pdf = {https://hal.science/hal-03102880v1/file/AI4IOT-updated.pdf},
  month = {January},
  year = {2021},
  address = {Yokohama, Japan},
  url = {https://hal.science/hal-03102880},
  abstract = {Pervasive computing promotes the integration of connected electronic devices in our living environments in order to deliver advanced services. Interest in machine learning approaches for engineering pervasive applications has increased rapidly. Recently federated learning (FL) has been proposed. It has immediately attracted attention as a new machine learning paradigm promoting the use of edge servers. This new paradigm seems to fit the pervasive environment well. However, federated learning has been applied so far to very specific applications. It still remains largely conceptual and needs to be clarified and tested. Here, we present experiments performed in the domain of Human Activity Recognition (HAR) on smartphones which exhibit challenges related to model convergence.},
}Pervasive computing promotes the integration of connected electronic devices in our living environments in order to deliver advanced services. Interest in machine learning approaches for engineering pervasive applications has increased rapidly. Recently federated learning (FL) has been proposed. It has immediately attracted attention as a new machine learning paradigm promoting the use of edge servers. This new paradigm seems to fit the pervasive environment well. However, federated learning has been applied so far to very specific applications. It still remains largely conceptual and needs to be clarified and tested. Here, we present experiments performed in the domain of Human Activity Recognition (HAR) on smartphones which exhibit challenges related to model convergence. 
- Proceedings Behavioral decision-making for urban autonomous driving in the presence of pedestrians using Deep Recurrent Q-Network
 N. Deshpande, D. Vaufreydaz and A. Spalanzani
 ICARCV – 16th International Conference on Control, Automation, Robotics and Vision, pp. 1-9, Shenzhen, China, December 2020
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{deshpande:hal-02977009,
  title = {{Behavioral decision-making for urban autonomous driving in the presence of pedestrians using Deep Recurrent Q-Network}},
  author = {Deshpande, Niranjan and Vaufreydaz, Dominique and Spalanzani, Anne},
  booktitle = {{ICARCV - 16th International Conference on Control, Automation, Robotics and Vision}},
  hal_version = {v1},
  hal_id = {hal-02977009},
  pdf = {https://inria.hal.science/hal-02977009v1/file/author_version.pdf},
  month = {December},
  year = {2020},
  pages = {1-9},
  address = {Shenzhen, China},
  url = {https://inria.hal.science/hal-02977009},
  abstract = {Decision making for autonomous driving in urban environments is challenging due to the complexity of the road structure and the uncertainty in the behavior of diverse road users. Traditional methods consist of manually designed rules as the driving policy, which require expert domain knowledge, are difficult to generalize and might give sub-optimal results as the environment gets complex. Whereas, using reinforcement learning, optimal driving policy could be learned and improved automatically through several interactions with the environment. However, current research in the field of reinforcement learning for autonomous driving is mainly focused on highway setup with little to no emphasis on urban environments. In this work, a deep reinforcement learning based decision-making approach for high-level driving behavior is proposed for urban environments in the presence of pedestrians. For this, the use of Deep Recurrent Q-Network (DRQN) is explored, a method combining state-of-the art Deep Q-Network (DQN) with a long term short term memory (LSTM) layer helping the agent gain a memory of the environment. A 3-D state representation is designed as the input combined with a well defined reward function to train the agent for learning an appropriate behavior policy in a real-world like urban simulator. The proposed method is evaluated for dense urban scenarios and compared with a rule-based approach and results show that the proposed DRQN based driving behavior decision maker outperforms the rule-based approach.},
}Decision making for autonomous driving in urban environments is challenging due to the complexity of the road structure and the uncertainty in the behavior of diverse road users. Traditional methods consist of manually designed rules as the driving policy, which require expert domain knowledge, are difficult to generalize and might give sub-optimal results as the environment gets complex. Whereas, using reinforcement learning, optimal driving policy could be learned and improved automatically through several interactions with the environment. However, current research in the field of reinforcement learning for autonomous driving is mainly focused on highway setup with little to no emphasis on urban environments. In this work, a deep reinforcement learning based decision-making approach for high-level driving behavior is proposed for urban environments in the presence of pedestrians. For this, the use of Deep Recurrent Q-Network (DRQN) is explored, a method combining state-of-the art Deep Q-Network (DQN) with a long term short term memory (LSTM) layer helping the agent gain a memory of the environment. A 3-D state representation is designed as the input combined with a well defined reward function to train the agent for learning an appropriate behavior policy in a real-world like urban simulator. The proposed method is evaluated for dense urban scenarios and compared with a rule-based approach and results show that the proposed DRQN based driving behavior decision maker outperforms the rule-based approach. 
- proceedings Group-Level Emotion Recognition Using a Unimodal Privacy-Safe Non-Individual Approach
 A. Petrova, D. Vaufreydaz and P. Dessus
 EmotiW2020 Challenge at the 22nd ACM International Conference on Multimodal Interaction (ICMI2020), Utrecht, Netherlands, October 2020
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@proceedings{petrova:hal-02937871,
  title = {{Group-Level Emotion Recognition Using a Unimodal Privacy-Safe Non-Individual Approach}},
  author = {Petrova, Anastasia and Vaufreydaz, Dominique and Dessus, Philippe},
  booktitle = {{EmotiW2020 Challenge at the 22nd ACM International Conference on Multimodal Interaction (ICMI2020)}},
  hal_version = {v1},
  hal_id = {hal-02937871},
  pdf = {https://inria.hal.science/hal-02937871v1/file/main.pdf},
  keywords = {EmotiW 2020 ; audio-video group emotion recognition ; Deep Learning ; affective computing ; privacy},
  doi = {10.48550/arXiv.2009.07013},
  month = {October},
  year = {2020},
  address = {Utrecht, Netherlands},
  url = {https://inria.hal.science/hal-02937871},
  abstract = {This article presents our unimodal privacy-safe and non-individual proposal for the audio-video group emotion recognition subtask at the Emotion Recognition in the Wild (EmotiW) Challenge 2020 1. This sub challenge aims to classify in the wild videos into three categories: Positive, Neutral and Negative. Recent deep learning models have shown tremendous advances in analyzing interactions between people, predicting human behavior and affective evaluation. Nonetheless, their performance comes from individual-based analysis, which means summing up and averaging scores from individual detections, which inevitably leads to some privacy issues. In this research, we investigated a frugal approach towards a model able to capture the global moods from the whole image without using face or pose detection, or any individual-based feature as input. The proposed methodology mixes state-of-the-art and dedicated synthetic corpora as training sources. With an in-depth exploration of neural network architectures for group-level emotion recognition, we built a VGG-based model achieving 59.13% accuracy on the VGAF test set (eleventh place of the challenge). Given that the analysis is unimodal based only on global features and that the performance is evaluated on a real-world dataset, these results are promising and let us envision extending this model to multimodality for classroom ambiance evaluation, our final target application.},
}This article presents our unimodal privacy-safe and non-individual proposal for the audio-video group emotion recognition subtask at the Emotion Recognition in the Wild (EmotiW) Challenge 2020 1. This sub challenge aims to classify in the wild videos into three categories: Positive, Neutral and Negative. Recent deep learning models have shown tremendous advances in analyzing interactions between people, predicting human behavior and affective evaluation. Nonetheless, their performance comes from individual-based analysis, which means summing up and averaging scores from individual detections, which inevitably leads to some privacy issues. In this research, we investigated a frugal approach towards a model able to capture the global moods from the whole image without using face or pose detection, or any individual-based feature as input. The proposed methodology mixes state-of-the-art and dedicated synthetic corpora as training sources. With an in-depth exploration of neural network architectures for group-level emotion recognition, we built a VGG-based model achieving 59.13% accuracy on the VGAF test set (eleventh place of the challenge). Given that the analysis is unimodal based only on global features and that the performance is evaluated on a real-world dataset, these results are promising and let us envision extending this model to multimodality for classroom ambiance evaluation, our final target application. 
- Proceedings IAS: an IoT Architectural Self-adaptation Framework
 M. T. Moghaddam, E. Rutten, P. Lalanda and G. Giraud
 ECSA 2020 – 14th European Conference on Software Architecture, pp. 1-16, L’Aquila, Italy, September 2020
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{moghaddam:hal-02900674,
  title = {{IAS: an IoT Architectural Self-adaptation Framework}},
  author = {Moghaddam, Mahyar T and Rutten, Eric and Lalanda, Philippe and Giraud, Guillaume},
  booktitle = {{ECSA 2020 - 14th European Conference on Software Architecture}},
  hal_version = {v1},
  hal_id = {hal-02900674},
  pdf = {https://inria.hal.science/hal-02900674v1/file/ECSA_2020_final.pdf},
  keywords = {IoT ; Software architecture ; Self-adaptation ; Autonomic control ; Functional control ; Performance ; Queuing networks},
  month = {September},
  year = {2020},
  pages = {1-16},
  address = {L'Aquila, Italy},
  url = {https://inria.hal.science/hal-02900674},
  abstract = {This paper develops a generic approach to model control loops and their interac- tion within the Internet of Things (IoT) environments. We take advantage of MAPE-K loops to enable architectural self-adaptation. The system’s architectural setting is aligned with the adaptation goals and the components run-time situation and constraints. We introduce an integrated framework for IoT Architectural Self-adaptation (IAS) where functional control elements are in charge of environmental adaptation and autonomic control elements handle the functional system’s architectural adaptation. A Queuing Networks (QN) approach was used for modeling the IAS. The IAS-QN can model control levels and their interaction to perform both architectural and environmental adaptations. The IAS-QN was modeled on a smart grid system for the Melle-Longchamp area (France). Our architectural adaptation approach successfully set the propositions to enhance the performance of the electricity trans- mission system. This industrial use-case is a part of CPS4EU European industrial innovation pro ject.},
}This paper develops a generic approach to model control loops and their interac- tion within the Internet of Things (IoT) environments. We take advantage of MAPE-K loops to enable architectural self-adaptation. The system’s architectural setting is aligned with the adaptation goals and the components run-time situation and constraints. We introduce an integrated framework for IoT Architectural Self-adaptation (IAS) where functional control elements are in charge of environmental adaptation and autonomic control elements handle the functional system’s architectural adaptation. A Queuing Networks (QN) approach was used for modeling the IAS. The IAS-QN can model control levels and their interaction to perform both architectural and environmental adaptations. The IAS-QN was modeled on a smart grid system for the Melle-Longchamp area (France). Our architectural adaptation approach successfully set the propositions to enhance the performance of the electricity trans- mission system. This industrial use-case is a part of CPS4EU European industrial innovation pro ject. 
- Proceedings Evaluation of Federated Learning Aggregation Algorithms Application to Human Activity Recognition
 S. Ek, F. Portet, P. Lalanda and G. Vega
 UbiComp/ISWC ’20: 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and 2020 ACM International Symposium on Wearable Computers, pp. 638-643, Virtual Event Mexico, France, September 2020
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{ek:hal-02941944,
  title = {{Evaluation of Federated Learning Aggregation Algorithms Application to Human Activity Recognition}},
  author = {Ek, Sannara and Portet, Fran{\c c}ois and Lalanda, Philippe and Vega, German},
  booktitle = {{UbiComp/ISWC '20: 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and 2020 ACM International Symposium on Wearable Computers}},
  hal_version = {v1},
  hal_id = {hal-02941944},
  pdf = {https://hal.science/hal-02941944v1/file/ubicomp_review_formatted_sannara_3%20%281%29.pdf},
  keywords = {Federated Learning ; Edge Comp ; Human activity recognition},
  doi = {10.1145/3410530.3414321},
  month = {September},
  year = {2020},
  pages = {638-643},
  publisher = {{ACM}},
  address = {Virtual Event Mexico, France},
  url = {https://hal.science/hal-02941944},
  abstract = {Pervasive computing promotes the integration of connected electronic devices in our living spaces in order to assist us through appropriate services. Two major developments have gained significant momentum recently: a better use of fog resources and the use of AI techniques. Specifically, interest in machine learning approaches for engineering applications has increased rapidly. This paradigm seems to fit the pervasive environment well. However , federated learning has been applied so far to specific services and remains largely conceptual. It needs to be tested extensively on pervasive services partially located in the fog. In this paper, we present experiments performed in the domain of Human Activity Recognition on smartphones in order to evaluate existing algorithms.},
}Pervasive computing promotes the integration of connected electronic devices in our living spaces in order to assist us through appropriate services. Two major developments have gained significant momentum recently: a better use of fog resources and the use of AI techniques. Specifically, interest in machine learning approaches for engineering applications has increased rapidly. This paradigm seems to fit the pervasive environment well. However , federated learning has been applied so far to specific services and remains largely conceptual. It needs to be tested extensively on pervasive services partially located in the fog. In this paper, we present experiments performed in the domain of Human Activity Recognition on smartphones in order to evaluate existing algorithms. 
- Proceedings Proof of concept and evaluation of eye gaze enhanced relevance feedback in ecological context
 V. Sungeelee, F. Jambon and P. Mulhem
 Proceedings of the Joint Conference of the Information Retrieval Communities in Europe (CIRCLE 2020), Samatan, Gers, France, July 6-9, 2020, Samatan, France, July 2020
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{sungeelee:hal-02972992,
  title = {{Proof of concept and evaluation of eye gaze enhanced relevance feedback in ecological context}},
  author = {Sungeelee, Vaynee and Jambon, Francis and Mulhem, Philippe},
  booktitle = {{Proceedings of the Joint Conference of the Information Retrieval Communities in Europe (CIRCLE 2020), Samatan, Gers, France, July 6-9, 2020}},
  hal_version = {v1},
  hal_id = {hal-02972992},
  pdf = {https://hal.science/hal-02972992v1/file/CIRCLE20_04.pdf},
  keywords = {proof of concept ; ecological context ; user behaviour ; eye tracking ; relevance feedback},
  month = {July},
  year = {2020},
  address = {Samatan, France},
  url = {https://hal.science/hal-02972992},
  abstract = {The major method for evaluating Information Retrieval systems still relies nowadays on the "Cranfield paradigm", supported by test collections. This sheds light on the fact that human behaviour is not considered central to Information Retrieval. For instance, some Information Retrieval systems that need users feedback to improve results relevance can not completely be evaluated with classical test collections (since the interaction itself is not a part of the evaluation). Our goal is to work toward the integration of specific human behaviour in Information Retrieval. More precisely, we studied the impact of eye gaze analysis on information retrieval. The hypothesis is that acquiring the terms read by a user on the result page displayed may be beneficial for a relevance feedback mechanism, without any explicit intervention of the user. We have implemented a proof of concept which allows us to experiment with this new method of interaction with a search engine. The contributions of our work are twofold. First, the proof of concept we created shows that eye gaze enhanced relevance feedback information retrieval systems could be implemented and that its evaluation gives interesting results. Second, we propose the basis of a evaluation platform for Information Retrieval systems that take into account users behaviour in ecological contexts.},
}The major method for evaluating Information Retrieval systems still relies nowadays on the "Cranfield paradigm", supported by test collections. This sheds light on the fact that human behaviour is not considered central to Information Retrieval. For instance, some Information Retrieval systems that need users feedback to improve results relevance can not completely be evaluated with classical test collections (since the interaction itself is not a part of the evaluation). Our goal is to work toward the integration of specific human behaviour in Information Retrieval. More precisely, we studied the impact of eye gaze analysis on information retrieval. The hypothesis is that acquiring the terms read by a user on the result page displayed may be beneficial for a relevance feedback mechanism, without any explicit intervention of the user. We have implemented a proof of concept which allows us to experiment with this new method of interaction with a search engine. The contributions of our work are twofold. First, the proof of concept we created shows that eye gaze enhanced relevance feedback information retrieval systems could be implemented and that its evaluation gives interesting results. Second, we propose the basis of a evaluation platform for Information Retrieval systems that take into account users behaviour in ecological contexts. 
- Journal Design spatial sociotechnique : le rôle des classes sensibles au contexte
 R. Laurent, P. Dessus and D. Vaufreydaz
 Distances et Médiations des Savoirs, vol. 30, pp. 1-8, July 2020
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{laurent:hal-02883770,
  title = {{Design spatial sociotechnique : le r{\^o}le des classes sensibles au contexte}},
  author = {Laurent, Romain and Dessus, Philippe and Vaufreydaz, Dominique},
  journal = {{Distances et M{\'e}diations des Savoirs}},
  hal_version = {v1},
  hal_id = {hal-02883770},
  pdf = {https://hal.science/hal-02883770v1/file/DMS-v-5.7.pdf},
  keywords = {Design de l'enseignement ; Ing{\'e}nierie p{\'e}dagogique ; Enseignement sup{\'e}rieur ; Espace ; Salles de classes sensibles au contexte ; {\'E}thique et vie priv{\'e}e},
  doi = {10.4000/dms.5228},
  month = {July},
  year = {2020},
  pages = {1-8},
  volume = {30},
  publisher = {{CNED-Centre national d'enseignement {\`a} distance}},
  url = {https://hal.science/hal-02883770},
  abstract = {La recherche en ingénierie éducative (instructional design) a jusqu’à présent été riche en théories et applications d’une grande puissance prescriptive et centrées principalement sur l’enseignant. En revanche elle paraît manquer encore de travaux rendant compte de l’activité de l’enseignant et de l’apprenant en contexte, donc avec une dimension descriptive plus importante. Ce que nous nommons « design spatial sociotechnique » peut devenir une activité de design plus globale que celles précédemment mises au jour. Nous montrons comment l’essor récent des salles de classe sensibles au contexte, ou « salles intelligentes » peut autoriser l’émergence de tels modèles, et à quelles conditions, en prenant des exemples dans l’enseignement universitaire.},
}La recherche en ingénierie éducative (instructional design) a jusqu’à présent été riche en théories et applications d’une grande puissance prescriptive et centrées principalement sur l’enseignant. En revanche elle paraît manquer encore de travaux rendant compte de l’activité de l’enseignant et de l’apprenant en contexte, donc avec une dimension descriptive plus importante. Ce que nous nommons « design spatial sociotechnique » peut devenir une activité de design plus globale que celles précédemment mises au jour. Nous montrons comment l’essor récent des salles de classe sensibles au contexte, ou « salles intelligentes » peut autoriser l’émergence de tels modèles, et à quelles conditions, en prenant des exemples dans l’enseignement universitaire. 
- Journal Ethical Teaching Analytics in a Context-Aware Classroom: A Manifesto
 R. Laurent, D. Vaufreydaz and P. Dessus
 ERCIM News, pp. 39–40, January 2020
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{laurent:hal-02438020,
  title = {{Ethical Teaching Analytics in a Context-Aware Classroom: A Manifesto}},
  author = {Laurent, Romain and Vaufreydaz, Dominique and Dessus, Philippe},
  journal = {{ERCIM News}},
  hal_version = {v1},
  hal_id = {hal-02438020},
  pdf = {https://hal.science/hal-02438020v1/file/ERCIM%20News%20No120_FC4-img.pdf},
  keywords = {ambient classroom ; ubiquitous computing ; machine learning ; teacher cognition ; teaching analytics ; learning analytics ; ethics and privacy},
  month = {January},
  year = {2020},
  pages = {39--40},
  number = {120},
  publisher = {{ERCIM}},
  url = {https://hal.science/hal-02438020},
  abstract = {Should Big Teacher be watching you? The Teaching Lab project at Grenoble Alpes University proposes recommendations for designing smart classrooms with ethical considerations taken into account.},
}Should Big Teacher be watching you? The Teaching Lab project at Grenoble Alpes University proposes recommendations for designing smart classrooms with ethical considerations taken into account. 
- Journal Collaborative Smartphone-Based User Positioning in a Multiple-User Context Using Wireless Technologies
 V. Ta, T. Dao, D. Vaufreydaz and E. Castelli
 Sensors, pp. 25, January 2020
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{ta:hal-02435610,
  title = {{Collaborative Smartphone-Based User Positioning in a Multiple-User Context Using Wireless Technologies}},
  author = {Ta, Viet-Cuong and Dao, Trung-Kien and Vaufreydaz, Dominique and Castelli, Eric},
  journal = {{Sensors}},
  hal_version = {v1},
  hal_id = {hal-02435610},
  pdf = {https://inria.hal.science/hal-02435610v1/file/sensors-20-00405%20%281%29.pdf},
  keywords = {indoor localization ; indoor navigation ; multi-sensor fusion ; multiple-user positioning},
  doi = {10.3390/s20020405},
  month = {January},
  year = {2020},
  pages = {25},
  publisher = {{MDPI}},
  url = {https://inria.hal.science/hal-02435610},
  abstract = {For the localization of multiple users, Bluetooth data from the smartphone is able to complement Wi-Fi-based methods with additional information, by providing an approximation of the relative distances between users. In practice, both positions provided by Wi-Fi data and relative distance provided by Bluetooth data are subject to a certain degree of noise due to the uncertainty of radio propagation in complex indoor environments. In this study, we propose and evaluate two approaches, namely Non-temporal and Temporal ones, of collaborative positioning to combine these two cohabiting technologies to improve the tracking performance. In the Non-temporal approach, our model establishes an error observation function in a specific interval of the Bluetooth and Wi-Fi output. It is then able to reduce the positioning error by looking for ways to minimize the error function. The Temporal approach employs an extended error model that takes into account the time component between users’ movements. For performance evaluation, several multi-user scenarios in an indoor environment are set up. Results show that for certain scenarios, the proposed approaches attain over 40% of improvement in terms of average accuracy.},
}For the localization of multiple users, Bluetooth data from the smartphone is able to complement Wi-Fi-based methods with additional information, by providing an approximation of the relative distances between users. In practice, both positions provided by Wi-Fi data and relative distance provided by Bluetooth data are subject to a certain degree of noise due to the uncertainty of radio propagation in complex indoor environments. In this study, we propose and evaluate two approaches, namely Non-temporal and Temporal ones, of collaborative positioning to combine these two cohabiting technologies to improve the tracking performance. In the Non-temporal approach, our model establishes an error observation function in a specific interval of the Bluetooth and Wi-Fi output. It is then able to reduce the positioning error by looking for ways to minimize the error function. The Temporal approach employs an extended error model that takes into account the time component between users’ movements. For performance evaluation, several multi-user scenarios in an indoor environment are set up. Results show that for certain scenarios, the proposed approaches attain over 40% of improvement in terms of average accuracy. 
- Proceedings Building Prior Knowledge: A Markov Based Pedestrian Prediction Model Using Urban Environmental Data
 P. Vasishta, D. Vaufreydaz and A. Spalanzani
 ICARCV 2018 – 15th International Conference on Control, Automation, Robotics and Vision, pp. 1-12, Singapore, Singapore, November 2018
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{vasishta:hal-01875147,
  title = {{Building Prior Knowledge: A Markov Based Pedestrian Prediction Model Using Urban Environmental Data}},
  author = {Vasishta, Pavan and Vaufreydaz, Dominique and Spalanzani, Anne},
  booktitle = {{ICARCV 2018 - 15th International Conference on Control, Automation, Robotics and Vision}},
  hal_version = {v1},
  hal_id = {hal-01875147},
  pdf = {https://inria.hal.science/hal-01875147v1/file/main_author.pdf},
  keywords = {Autonomous Vehicles ; Situational Awareness ; Pedestrian Behaviour ; Hidden Markov Models},
  month = {November},
  year = {2018},
  pages = {1-12},
  address = {Singapore, Singapore},
  url = {https://inria.hal.science/hal-01875147},
  abstract = {Autonomous Vehicles navigating in urban areas have a need to understand and predict future pedestrian behavior for safer navigation. This high level of situational awareness requires observing pedestrian behavior and extrapolating their positions to know future positions. While some work has been done in this field using Hidden Markov Models (HMMs), one of the few observed drawbacks of the method is the need for informed priors for learning behavior. In this work, an extension to the Growing Hidden Markov Model (GHMM) method is proposed to solve some of these drawbacks. This is achieved by building on existing work using potential cost maps and the principle of Natural Vision. As a consequence, the proposed model is able to predict pedestrian positions more precisely over a longer horizon compared to the state of the art. The method is tested over "legal" and "illegal" behavior of pedestrians, having trained the model with sparse observations and partial trajectories. The method, with no training data, is compared against a trained state of the art model. It is observed that the proposed method is robust even in new, previously unseen areas.},
}Autonomous Vehicles navigating in urban areas have a need to understand and predict future pedestrian behavior for safer navigation. This high level of situational awareness requires observing pedestrian behavior and extrapolating their positions to know future positions. While some work has been done in this field using Hidden Markov Models (HMMs), one of the few observed drawbacks of the method is the need for informed priors for learning behavior. In this work, an extension to the Growing Hidden Markov Model (GHMM) method is proposed to solve some of these drawbacks. This is achieved by building on existing work using potential cost maps and the principle of Natural Vision. As a consequence, the proposed model is able to predict pedestrian positions more precisely over a longer horizon compared to the state of the art. The method is tested over "legal" and "illegal" behavior of pedestrians, having trained the model with sparse observations and partial trajectories. The method, with no training data, is compared against a trained state of the art model. It is observed that the proposed method is robust even in new, previously unseen areas. 
- Proceedings De la carte au ciel : une approche empirique pour l’étude des stratégies visuelles de pilotes experts
 R. Balzarini and F. Jambon
 Spatial Analysis and GEOmatics (SAGEO 2018), pp. 140-142, Montpelier, France, November 2018
  HAL[BibTeX] HAL[BibTeX]@inproceedings{balzarini:hal-01988274,
  title = {{De la carte au ciel : une approche empirique pour l'{\'e}tude des strat{\'e}gies visuelles de pilotes experts}},
  author = {Balzarini, Raffaella and Jambon, Francis},
  booktitle = {{Spatial Analysis and GEOmatics (SAGEO 2018)}},
  hal_version = {v1},
  hal_id = {hal-01988274},
  keywords = {cartes a{\'e}ronautiques ; oculom{\'e}trie ; repr{\'e}sentations cartographiques et mentales ; rep{\`e}res ; strat{\'e}gies visuelles},
  month = {November},
  year = {2018},
  pages = {140-142},
  address = {Montpelier, France},
  note = {Pr{\'e}sentation courte de l'article international (hal-02055475) : R. Balzarini and F. Jambon. From Map to Sky: an Empirical Study on Visual Strategies of Expert Pilots. 3rd International Workshop on Eye Tracking for Spatial Research, ETH Zurich, Zurich, Switzerland (doi: 10.3929/ethz-b-000222256) p. 64-69, 2018.},
  url = {https://hal.science/hal-01988274},
  abstract = {},
}
- Proceedings Service-Oriented Approach for Analytics in Industry 4.0
 P. Lalanda and D. Morand
 International Conference on Service-Oriented Computing, pp. 756-770, Hangzhou, Zhejiang, China, November 2018
  HAL[BibTeX] HAL[BibTeX]@inproceedings{lalanda:hal-02014637,
  title = {{Service-Oriented Approach for Analytics in Industry 4.0}},
  author = {Lalanda, Philippe and Morand, Denis},
  booktitle = {{International Conference on Service-Oriented Computing}},
  hal_version = {v1},
  hal_id = {hal-02014637},
  month = {November},
  year = {2018},
  pages = {756-770},
  address = {Hangzhou, Zhejiang, China},
  url = {https://hal.science/hal-02014637},
  abstract = {},
}
- Proceedings The Role of Emotion in Problem Solving: First Results from Observing Chess
 T. Guntz, J. L. Crowley, D. Vaufreydaz, R. Balzarini and P. Dessus
 ICMI 2018 – Workshop at 20th ACM International Conference on Multimodal Interaction, pp. 1-13, Boulder, Colorado, United States, October 2018
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{guntz:hal-01886694,
  title = {{The Role of Emotion in Problem Solving: First Results from Observing Chess}},
  author = {Guntz, Thomas and Crowley, James L. and Vaufreydaz, Dominique and Balzarini, Raffaella and Dessus, Philippe},
  booktitle = {{ICMI 2018 - Workshop at 20th ACM International Conference on Multimodal Interaction}},
  hal_version = {v1},
  hal_id = {hal-01886694},
  pdf = {https://inria.hal.science/hal-01886694v1/file/main.pdf},
  keywords = {Problem Solving ; Emotions ; Situation Modeling ; Concept Formation ; Chunking ; Working memory},
  month = {October},
  year = {2018},
  pages = {1-13},
  address = {Boulder, Colorado, United States},
  url = {https://inria.hal.science/hal-01886694},
  abstract = {In this paper we present results from recent experiments that suggest that chess players associate emotions to game situations and reactively use these associations to guide search for planning and problem solving. We describe the design of an instrument for capturing and interpreting multimodal signals of humans engaged in solving challenging problems. We review results from a pilot experiment with human experts engaged in solving challenging problems in Chess that revealed an unexpected observation of rapid changes in emotion as players attempt to solve challenging problems. We propose a cognitive model that describes the process by which subjects select chess chunks for use in interpretation of the game situation and describe initial results from a second experiment designed to test this model.},
}In this paper we present results from recent experiments that suggest that chess players associate emotions to game situations and reactively use these associations to guide search for planning and problem solving. We describe the design of an instrument for capturing and interpreting multimodal signals of humans engaged in solving challenging problems. We review results from a pilot experiment with human experts engaged in solving challenging problems in Chess that revealed an unexpected observation of rapid changes in emotion as players attempt to solve challenging problems. We propose a cognitive model that describes the process by which subjects select chess chunks for use in interpretation of the game situation and describe initial results from a second experiment designed to test this model. 
- Proceedings Smartphone-based user positioning in a multiple-user context with Wi-Fi and Bluetooth
 V. Ta, T. Dao, D. Vaufreydaz and E. Castelli
 IPIN 2018 – 9th International Conference on Indoor Positioning and Indoor Navigation, pp. 1-13, Nantes, France, September 2018
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{ta:hal-01839574,
  title = {{Smartphone-based user positioning in a multiple-user context with Wi-Fi and Bluetooth}},
  author = {Ta, Viet-Cuong and Dao, Trung-Kien and Vaufreydaz, Dominique and Castelli, Eric},
  booktitle = {{IPIN 2018 - 9th International Conference on Indoor Positioning and Indoor Navigation}},
  hal_version = {v1},
  hal_id = {hal-01839574},
  pdf = {https://inria.hal.science/hal-01839574v1/file/SmarphoneCollaborativeLocalization.pdf},
  keywords = {Wi-Fi ; Bluetooth ; smartphone applications ; collaborative positioning ; indoor navigation ; indoor localization},
  month = {September},
  year = {2018},
  pages = {1-13},
  address = {Nantes, France},
  url = {https://inria.hal.science/hal-01839574},
  abstract = {In a multiuser context, the Bluetooth data from the smartphone could give an approximation of the distance between users. Meanwhile, the Wi-Fi data can be used to calculate the user's position directly. However, both the Wi-Fi-based position outputs and Bluetooth-based distances are affected by some degree of noise. In our work, we propose several approaches to combine the two types of outputs for improving the tracking accuracy in the context of collaborative positioning. The two proposed approaches attempt to build a model for measuring the errors of the Bluetooth output and Wi-Fi output. In a non-temporal approach, the model establishes the relationship in a specific interval of the Bluetooth output and Wi-Fi output. In a temporal approach, the error measurement model is expanded to include the time component between users' movement. To evaluate the performance of the two approaches, we collected the data from several multiuser scenarios in indoor environment. The results show that the proposed approaches could reach a distance error around 3.0m for 75 percent of time, which outperforms the positioning results of the standard Wi-Fi fingerprinting model.},
}In a multiuser context, the Bluetooth data from the smartphone could give an approximation of the distance between users. Meanwhile, the Wi-Fi data can be used to calculate the user's position directly. However, both the Wi-Fi-based position outputs and Bluetooth-based distances are affected by some degree of noise. In our work, we propose several approaches to combine the two types of outputs for improving the tracking accuracy in the context of collaborative positioning. The two proposed approaches attempt to build a model for measuring the errors of the Bluetooth output and Wi-Fi output. In a non-temporal approach, the model establishes the relationship in a specific interval of the Bluetooth output and Wi-Fi output. In a temporal approach, the error measurement model is expanded to include the time component between users' movement. To evaluate the performance of the two approaches, we collected the data from several multiuser scenarios in indoor environment. The results show that the proposed approaches could reach a distance error around 3.0m for 75 percent of time, which outperforms the positioning results of the standard Wi-Fi fingerprinting model. 
- Journal XWARE-A customizable interoperability framework for pervasive computing systems
 F. M. Roth, C. Becker, G. Vega and P. Lalanda
 Pervasive and Mobile Computing, vol. 47, pp. 13-30, July 2018
  DOI DOI HAL[BibTeX] HAL[BibTeX]@article{roth:hal-02023356,
  title = {{XWARE-A customizable interoperability framework for pervasive computing systems}},
  author = {Roth, Felix Maximilian and Becker, Christian and Vega, German and Lalanda, Philippe},
  journal = {{Pervasive and Mobile Computing}},
  hal_version = {v1},
  hal_id = {hal-02023356},
  doi = {10.1016/j.pmcj.2018.03.005},
  month = {July},
  year = {2018},
  pages = {13-30},
  volume = {47},
  publisher = {{Elsevier}},
  url = {https://hal.science/hal-02023356},
  abstract = {},
}
- Ph.D. Thesis Perception multimodale et interaction sociable
 Dominique Vaufreydaz
 July 2018
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@phdthesis{vaufreydaz:tel-01970420,
  title = {{Perception multimodale et interaction sociable}},
  author = {Vaufreydaz, Dominique},
  hal_version = {v1},
  hal_id = {tel-01970420},
  pdf = {https://inria.hal.science/tel-01970420v1/file/HDR-Vaufreydaz.pdf},
  type = {Accreditation to supervise research},
  keywords = {signal processing ; computer vision ; multimodal perception ; machine learning ; sociable interaction ; interaction sociable ; apprentissage machine ; perception multimodale ; vision par ordinateur ; traitement du signal},
  month = {July},
  year = {2018},
  school = {{Universit{\'e} Grenoble Alpes (France) ; MSTII}},
  url = {https://inria.hal.science/tel-01970420},
  abstract = {L’une des tâches les plus complexes pour laquelle les ordinateurs ont été programmés concerne le mimétisme des capacités de perception et d’interaction des humains en utilisant tout d’abord des informations monomodales (acoustiques, visuelles, tactiles, de proprioception, …) puis multimodales en combinant plusieurs modalités. À partir de ces capacités de perception, les systèmes interactifs, c’est-à-dire les systèmes interagissant avec des humains, peuvent être sensibles à l’environnement qui les entoure, aux utilisateurs présents, à la situation courante… Cela leur permet de percevoir, comprendre et prédire pour agir en conséquence, voire d’agir d’une manière sociable pour être un partenaire des humains à part entière. La perception multimodale par ordinateur et les interactions sociables sont les problématiques de fond de mes travaux depuis mon recrutement en tant que Maître de conférences en 2005, le traitement du signal (« signal processing ») et l’apprentissage automatique (« machine learning ») en étant les fondements. Ce manuscrit présente mes travaux sur la perception multimodale et les interactions sociables dans plusieurs contextes en les regroupant autour de mes thématiques de recherche principales. Ce manuscrit aborde tout d’abord la perception multimodale ubiquitaire au sein d’espaces perceptifs multimodaux tels les salles de réunions augmentées, les appartements équipés pour le maintien de personnes âgées/fragiles à domicile ou des espaces à plus grande échelle comme les bâtiments d’un campus universitaire. Faisant suite aux progrès en robotique, cette perception s’est naturellement déplacée des environnements perceptifs vers les robots mobiles, permettant des interactions sociables entre les humains et des robots compagnons (Human Robot Interaction - HRI) mais aussi avec des robots particuliers que sont les véhicules autonomes. Les travaux de recherche concernant la perception des humains et de leurs affects sont ensuite présentés via mes recherches sur la perception en champ proche (< 1 m) et sur la détection des humains et de leurs comportements autour de nos systèmes interactifs, base nécessaire à leur fonctionnement. Nos travaux préliminaires sur la détection de personnes en utilisant de l’apprentissage profond (« Deep Learning ») sont décrits. Ce manuscrit se clôt en présentant les directions et les perspectives de mon projet de recherche intitulé « Perception multimodale et interaction sociable ».},
}L’une des tâches les plus complexes pour laquelle les ordinateurs ont été programmés concerne le mimétisme des capacités de perception et d’interaction des humains en utilisant tout d’abord des informations monomodales (acoustiques, visuelles, tactiles, de proprioception, …) puis multimodales en combinant plusieurs modalités. À partir de ces capacités de perception, les systèmes interactifs, c’est-à-dire les systèmes interagissant avec des humains, peuvent être sensibles à l’environnement qui les entoure, aux utilisateurs présents, à la situation courante… Cela leur permet de percevoir, comprendre et prédire pour agir en conséquence, voire d’agir d’une manière sociable pour être un partenaire des humains à part entière. La perception multimodale par ordinateur et les interactions sociables sont les problématiques de fond de mes travaux depuis mon recrutement en tant que Maître de conférences en 2005, le traitement du signal (« signal processing ») et l’apprentissage automatique (« machine learning ») en étant les fondements. Ce manuscrit présente mes travaux sur la perception multimodale et les interactions sociables dans plusieurs contextes en les regroupant autour de mes thématiques de recherche principales. Ce manuscrit aborde tout d’abord la perception multimodale ubiquitaire au sein d’espaces perceptifs multimodaux tels les salles de réunions augmentées, les appartements équipés pour le maintien de personnes âgées/fragiles à domicile ou des espaces à plus grande échelle comme les bâtiments d’un campus universitaire. Faisant suite aux progrès en robotique, cette perception s’est naturellement déplacée des environnements perceptifs vers les robots mobiles, permettant des interactions sociables entre les humains et des robots compagnons (Human Robot Interaction - HRI) mais aussi avec des robots particuliers que sont les véhicules autonomes. Les travaux de recherche concernant la perception des humains et de leurs affects sont ensuite présentés via mes recherches sur la perception en champ proche (< 1 m) et sur la détection des humains et de leurs comportements autour de nos systèmes interactifs, base nécessaire à leur fonctionnement. Nos travaux préliminaires sur la détection de personnes en utilisant de l’apprentissage profond (« Deep Learning ») sont décrits. Ce manuscrit se clôt en présentant les directions et les perspectives de mon projet de recherche intitulé « Perception multimodale et interaction sociable ». 
- Proceedings Personal space of autonomous car's passengers sitting in the driver's seat
 E. Ferrier-Barbut, D. Vaufreydaz, J. David, J. Lussereau and A. Spalanzani
 IV'2018 - The 29th IEEE Intelligent Vehicles Symposium, pp. 2022-2029, Changshu, Suzhou, China, June 2018
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{ferrierbarbut:hal-01786006,
  title = {{Personal space of autonomous car's passengers sitting in the driver's seat}},
  author = {Ferrier-Barbut, Eleonore and Vaufreydaz, Dominique and David, Jean-Alix and Lussereau, J{\'e}r{\^o}me and Spalanzani, Anne},
  booktitle = {{IV'2018 - The 29th IEEE Intelligent Vehicles Symposium}},
  hal_version = {v1},
  hal_id = {hal-01786006},
  pdf = {https://inria.hal.science/hal-01786006v1/file/PersonalSpaceAutonomousCar.pdf},
  doi = {10.1109/IVS.2018.8500648},
  month = {June},
  year = {2018},
  pages = {2022-2029},
  publisher = {{IEEE}},
  address = {Changshu, Suzhou, China},
  url = {https://inria.hal.science/hal-01786006},
  abstract = {This article deals with the specific context of an autonomous car navigating in an urban center within a shared space between pedestrians and cars. The driver delegates the control to the autonomous system while remaining seated in the driver's seat. The proposed study aims at giving a first insight into the definition of human perception of space applied to vehicles by testing the existence of a personal space around the car.It aims at measuring proxemic information about the driver's comfort zone in such conditions.Proxemics, or human perception of space, has been largely explored when applied to humans or to robots, leading to the concept of personal space, but poorly when applied to vehicles. In this article, we highlight the existence and the characteristics of a zone of comfort around the car which is not correlated to the risk of a collision between the car and other road users. Our experiment includes 19 volunteers using a virtual reality headset to look at 30 scenarios filmed in 360° from the point of view of a passenger sitting in the driver's seat of an autonomous car.They were asked to say "stop" when they felt discomfort visualizing the scenarios.As said, the scenarios voluntarily avoid collision effect as we do not want to measure fear but discomfort.The scenarios involve one or three pedestrians walking past the car at different distances from the wings of the car, relative to the direction of motion of the car, on both sides. The car is either static or moving straight forward at different speeds.The results indicate the existence of a comfort zone around the car in which intrusion causes discomfort.The size of the comfort zone is sensitive neither to the side of the car where the pedestrian passes nor to the number of pedestrians. In contrast, the feeling of discomfort is relative to the car's motion (static or moving).Another outcome from this study is an illustration of the usage of first person 360° video and a virtual reality headset to evaluate feelings of a passenger within an autonomous car.},
}This article deals with the specific context of an autonomous car navigating in an urban center within a shared space between pedestrians and cars. The driver delegates the control to the autonomous system while remaining seated in the driver's seat. The proposed study aims at giving a first insight into the definition of human perception of space applied to vehicles by testing the existence of a personal space around the car.It aims at measuring proxemic information about the driver's comfort zone in such conditions.Proxemics, or human perception of space, has been largely explored when applied to humans or to robots, leading to the concept of personal space, but poorly when applied to vehicles. In this article, we highlight the existence and the characteristics of a zone of comfort around the car which is not correlated to the risk of a collision between the car and other road users. Our experiment includes 19 volunteers using a virtual reality headset to look at 30 scenarios filmed in 360° from the point of view of a passenger sitting in the driver's seat of an autonomous car.They were asked to say "stop" when they felt discomfort visualizing the scenarios.As said, the scenarios voluntarily avoid collision effect as we do not want to measure fear but discomfort.The scenarios involve one or three pedestrians walking past the car at different distances from the wings of the car, relative to the direction of motion of the car, on both sides. The car is either static or moving straight forward at different speeds.The results indicate the existence of a comfort zone around the car in which intrusion causes discomfort.The size of the comfort zone is sensitive neither to the side of the car where the pedestrian passes nor to the number of pedestrians. In contrast, the feeling of discomfort is relative to the car's motion (static or moving).Another outcome from this study is an illustration of the usage of first person 360° video and a virtual reality headset to evaluate feelings of a passenger within an autonomous car. 
- Proceedings A Framework for a Multimodal Analysis of Teaching Centered on Shared Attention and Knowledge Access
 P. Dessus, L. Aubineau, D. Vaufreydaz and J. L. Crowley
 Grenoble Workshop on Models and Analysis of Eye Movements, Grenoble, France, June 2018
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{dessus:hal-01811092,
  title = {{A Framework for a Multimodal Analysis of Teaching Centered on Shared Attention and Knowledge Access}},
  author = {Dessus, Philippe and Aubineau, Louise-H{\'e}l{\'e}na and Vaufreydaz, Dominique and Crowley, James L.},
  booktitle = {{Grenoble Workshop on Models and Analysis of Eye Movements}},
  hal_version = {v1},
  hal_id = {hal-01811092},
  pdf = {https://hal.science/hal-01811092v1/file/eye-mov-18.pdf},
  keywords = {Eye tracking ; Classroom Observation ; Joint Attention ; Teacher cognition},
  month = {June},
  year = {2018},
  address = {Grenoble, France},
  url = {https://hal.science/hal-01811092},
  abstract = {The effects of teaching on learning are mostly uncertain, hidden, and not immediate. Research investigating how teaching can have an impact on learning has recently been given a significant boost with signal processing devices and data mining analyses. We devised a framework for the study of teaching and learning processes which posits that lessons are composed of episodes of joint attention and access to the taught content, and that the interplay of behaviors like joint attention, actional contingency, and feedback loops compose different levels of teaching. Teaching by social tolerance, which occurs when learners (Ls) have no attentional problems but their access to the taught knowledge depends on the teacher (T). Teaching by opportunity provisioning, when Ls can be aware on the taught content but lack access to it (e.g., lack of understanding), and T builds ad hoc situations in which Ls are provided with easier content. Teaching by stimulus or local enhancement, when Ls have fully access to the content but lack attention toward it. T explicitly shows content to Ls, slows down her behaviors, tells and acts in an adapted way (e.g., motherese). A variety of devices installed in a classroom will capture and automatically characterize these events. T’s and Ls’ utterances and gazes will be recorded through low-cost cameras installed on 3D printed glasses, and T will wear a mobile eye tracker and a mobile microphone. Instructional material is equipped with qrcodes so that Ls’ and T’s video streams are processed to determine where people are looking at, and to infer the corresponding teaching levels. This novel framework will be used to analyze instructional events in ecological situations, and will be a first step to build a ”pervasive classroom”, where eye-tracking and sensor-based devices analyze a wide range of events in a multimodal and interdisciplinary way.},
}The effects of teaching on learning are mostly uncertain, hidden, and not immediate. Research investigating how teaching can have an impact on learning has recently been given a significant boost with signal processing devices and data mining analyses. We devised a framework for the study of teaching and learning processes which posits that lessons are composed of episodes of joint attention and access to the taught content, and that the interplay of behaviors like joint attention, actional contingency, and feedback loops compose different levels of teaching. Teaching by social tolerance, which occurs when learners (Ls) have no attentional problems but their access to the taught knowledge depends on the teacher (T). Teaching by opportunity provisioning, when Ls can be aware on the taught content but lack access to it (e.g., lack of understanding), and T builds ad hoc situations in which Ls are provided with easier content. Teaching by stimulus or local enhancement, when Ls have fully access to the content but lack attention toward it. T explicitly shows content to Ls, slows down her behaviors, tells and acts in an adapted way (e.g., motherese). A variety of devices installed in a classroom will capture and automatically characterize these events. T’s and Ls’ utterances and gazes will be recorded through low-cost cameras installed on 3D printed glasses, and T will wear a mobile eye tracker and a mobile microphone. Instructional material is equipped with qrcodes so that Ls’ and T’s video streams are processed to determine where people are looking at, and to infer the corresponding teaching levels. This novel framework will be used to analyze instructional events in ecological situations, and will be a first step to build a ”pervasive classroom”, where eye-tracking and sensor-based devices analyze a wide range of events in a multimodal and interdisciplinary way. 
- Proceedings A Hybrid Architecture for Non-Technical Skills Diagnosis
 Y. Bourrier, J. Francis, C. Garbay and V. Luengo
 Lecture Notes in Computer Science, vol. 10858, pp. 300-305, Montreal, Canada, June 2018
  DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{bourrier:hal-01774652,
  title = {{A Hybrid Architecture for Non-Technical Skills Diagnosis}},
  author = {Bourrier, Yannick and Francis, Jambon and Garbay, Catherine and Luengo, Vanda},
  booktitle = {{Lecture Notes in Computer Science}},
  hal_version = {v1},
  hal_id = {hal-01774652},
  keywords = {Ill-defined domains ; Neural networks ; Bayesian networks},
  doi = {10.1007/978-3-319-91464-0\_31},
  month = {June},
  year = {2018},
  pages = {300-305},
  volume = {10858},
  series = {Lecture Notes in Computer Science},
  publisher = {{Springer}},
  address = {Montreal, Canada},
  url = {https://hal.science/hal-01774652},
  abstract = {Our Virtual Learning Environment aims at improving the abilities of experienced technicians to handle critical situations through appropriate use of non-technical skills (NTS), a high-stake matter in many domains as bad mobilization of these skills is the cause of many accidents. To do so, our environment dynamically generates critical situations designed to target these NTS. As the situations need to be adapted to the learner’s skill level, we designed a hybrid architecture able to diagnose NTS. This architecture combines symbolic knowledge about situations, a neural network to drive the learner’s performance evaluation process, and a Bayesian network to model the causality links between situation knowledge and performance to reach NTS diagnosis. A proof of concept is presented in a driving critical situation.},
}Our Virtual Learning Environment aims at improving the abilities of experienced technicians to handle critical situations through appropriate use of non-technical skills (NTS), a high-stake matter in many domains as bad mobilization of these skills is the cause of many accidents. To do so, our environment dynamically generates critical situations designed to target these NTS. As the situations need to be adapted to the learner’s skill level, we designed a hybrid architecture able to diagnose NTS. This architecture combines symbolic knowledge about situations, a neural network to drive the learner’s performance evaluation process, and a Bayesian network to model the causality links between situation knowledge and performance to reach NTS diagnosis. A proof of concept is presented in a driving critical situation. 
- Preprint Toward Eye Gaze Enhanced Information Retrieval Relevance Feedback
 J. Francis, P. Mulhem and L. Albarede
 pp. 23, June 2018
  HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@misc{francis:hal-01885125,
  title = {{Toward Eye Gaze Enhanced Information Retrieval Relevance Feedback}},
  author = {Francis, Jambon and Mulhem, Philippe and Albarede, Lucas},
  hal_version = {v1},
  hal_id = {hal-01885125},
  keywords = {Eye Fixations on Words ; Information Retrieval ; Relevance Feedback},
  month = {June},
  year = {2018},
  pages = {23},
  howpublished = {{Grenoble Workshop on Models and Analysis of Eye Movements}},
  note = {Poster},
  url = {https://hal.science/hal-01885125},
  abstract = {Information Retrieval (IR) is dedicated to retrieve relevant documents according to a user’s query. The literature in this field shows that gathering relevance information provided by the user on the documents retrieved by the IR system increases the overall quality of the system. The relevance information provided by the user is processed to refine his/her initial query, in a process called Relevance Feedback. Since it is cumbersome and time consuming for the user to explicitly provide such informa- tion, our hypothesis is that eye gaze information could be used to implicitly estimate the user’s interests, and thus help the relevance feedback mechanism. The main research question tackled here is is twofold: (1) what is the user behavioral model at the visual level in an information retrieval task, and how this model would determine the user’s interests and (2) how to integrate effectively such eye gaze elements into a relevance feedback mechanism in classical IR systems that present results list with documents extracts (called snippets). To achieve this goal, we split the problem into the following steps: (a) to model the user behaviour in front of a result list composed of snippets; (b) to define the eye gaze elements to be acquired and the way to link them to the user’s interests in document contents; (c) to build relevance feedback mechanisms that are able to use these elements; and (d) to ex- periment the proposal on a classical IR test collections to compare them to other relevance feedback approaches. The work presented here focuses on the former two elements above: we define a experimen- tal context to gather relevant information about user’s behaviour in front a result display composed of snippets, and we deduce the EM elements that will need to be acquired in order to perform IR relevance feedback.},
}Information Retrieval (IR) is dedicated to retrieve relevant documents according to a user’s query. The literature in this field shows that gathering relevance information provided by the user on the documents retrieved by the IR system increases the overall quality of the system. The relevance information provided by the user is processed to refine his/her initial query, in a process called Relevance Feedback. Since it is cumbersome and time consuming for the user to explicitly provide such informa- tion, our hypothesis is that eye gaze information could be used to implicitly estimate the user’s interests, and thus help the relevance feedback mechanism. The main research question tackled here is is twofold: (1) what is the user behavioral model at the visual level in an information retrieval task, and how this model would determine the user’s interests and (2) how to integrate effectively such eye gaze elements into a relevance feedback mechanism in classical IR systems that present results list with documents extracts (called snippets). To achieve this goal, we split the problem into the following steps: (a) to model the user behaviour in front of a result list composed of snippets; (b) to define the eye gaze elements to be acquired and the way to link them to the user’s interests in document contents; (c) to build relevance feedback mechanisms that are able to use these elements; and (d) to ex- periment the proposal on a classical IR test collections to compare them to other relevance feedback approaches. The work presented here focuses on the former two elements above: we define a experimen- tal context to gather relevant information about user’s behaviour in front a result display composed of snippets, and we deduce the EM elements that will need to be acquired in order to perform IR relevance feedback. 
- Proceedings Autonomic caching management in industrial smart gateways
 P. Lalanda, J. Mertz and I. Nunes
 2018 IEEE Industrial Cyber-Physical Systems (ICPS), pp. 26-31, St. Petersburg, France, May 2018
  HAL[BibTeX] HAL[BibTeX]@inproceedings{lalanda:hal-02023380,
  title = {{Autonomic caching management in industrial smart gateways}},
  author = {Lalanda, Philippe and Mertz, Jhonny and Nunes, Ingrid},
  booktitle = {{2018 IEEE Industrial Cyber-Physical Systems (ICPS)}},
  hal_version = {v1},
  hal_id = {hal-02023380},
  month = {May},
  year = {2018},
  pages = {26-31},
  publisher = {{IEEE}},
  address = {St. Petersburg, France},
  url = {https://hal.science/hal-02023380},
  abstract = {},
}
- Proceedings Un réseau bayésien pour le diagnostic des compétences non-techniques en situation critique
 Y. Bourrier, F. Jambon, C. Garbay and V. Luengo
 Journées Francophones sur les Réseaux Bayésiens et les Modèles Graphiques Probabilistes, Toulouse, France, May 2018
  HAL[BibTeX] HAL[BibTeX]@inproceedings{bourrier:hal-01857604,
  title = {{Un r{\'e}seau bay{\'e}sien pour le diagnostic des comp{\'e}tences non-techniques en situation critique}},
  author = {Bourrier, Yannick and Jambon, Francis and Garbay, Catherine and Luengo, Vanda},
  booktitle = {{Journ{\'e}es Francophones sur les R{\'e}seaux Bay{\'e}siens et les Mod{\`e}les Graphiques Probabilistes}},
  hal_version = {v1},
  hal_id = {hal-01857604},
  month = {May},
  year = {2018},
  address = {Toulouse, France},
  url = {https://hal.science/hal-01857604},
  abstract = {},
}
- Journal Multimodal Observation and Classification of People Engaged in Problem Solving: Application to Chess Players
 T. Guntz, R. Balzarini, D. Vaufreydaz and J. L. Crowley
 Multimodal Technologies and Interaction, vol. 2, no. 2, March 2018
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{guntz:hal-01886354,
  title = {{Multimodal Observation and Classification of People Engaged in Problem Solving: Application to Chess Players}},
  author = {Guntz, Thomas and Balzarini, Raffaella and Vaufreydaz, Dominique and Crowley, James L.},
  journal = {{Multimodal Technologies and Interaction}},
  hal_version = {v1},
  hal_id = {hal-01886354},
  pdf = {https://inria.hal.science/hal-01886354v1/file/mti-02-00011.pdf},
  keywords = {multimodal perception ; affective computing ; situation awareness},
  doi = {10.3390/mti2020011},
  month = {March},
  year = {2018},
  number = {2},
  volume = {2},
  publisher = {{MDPI}},
  url = {https://inria.hal.science/hal-01886354},
  abstract = {In this paper we present the first results of a pilot experiment in the interpretation of multimodal observations of human experts engaged in solving challenging chess problems. Our goal is to investigate the extent to which observations of eye-gaze, posture, emotion and other physiological signals can be used to model the cognitive state of subjects, and to explore the integration of multiple sensor modalities to improve the reliability of detection of human displays of awareness and emotion. Domains of application for such cognitive model based systems are, for instance, healthy autonomous ageing or automated training systems. Abilities to observe cognitive abilities and emotional reactions can allow artificial systems to provide appropriate assistance in such contexts. We observed chess players engaged in problems of increasing difficulty while recording their behavior. Such recordings can be used to estimate a participant's awareness of the current situation and to predict ability to respond effectively to challenging situations. Feature selection has been performed to construct a multimodal classifier relying on the most relevant features from each modality. Initial results indicate that eye-gaze, body posture and emotion are good features to capture such awareness. This experiment also validates the use of our equipment as a general and reproducible tool for the study of participants engaged in screen-based interaction and/or problem solving.},
}In this paper we present the first results of a pilot experiment in the interpretation of multimodal observations of human experts engaged in solving challenging chess problems. Our goal is to investigate the extent to which observations of eye-gaze, posture, emotion and other physiological signals can be used to model the cognitive state of subjects, and to explore the integration of multiple sensor modalities to improve the reliability of detection of human displays of awareness and emotion. Domains of application for such cognitive model based systems are, for instance, healthy autonomous ageing or automated training systems. Abilities to observe cognitive abilities and emotional reactions can allow artificial systems to provide appropriate assistance in such contexts. We observed chess players engaged in problems of increasing difficulty while recording their behavior. Such recordings can be used to estimate a participant's awareness of the current situation and to predict ability to respond effectively to challenging situations. Feature selection has been performed to construct a multimodal classifier relying on the most relevant features from each modality. Initial results indicate that eye-gaze, body posture and emotion are good features to capture such awareness. This experiment also validates the use of our equipment as a general and reproducible tool for the study of participants engaged in screen-based interaction and/or problem solving. 
- Proceedings An Interoperable Notification Service for Pervasive Computing
 F. M. Roth, M. Pfannemueller, C. Becker and P. Lalanda
 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), pp. 842-847, Athens, France, March 2018
  HAL[BibTeX] HAL[BibTeX]@inproceedings{roth:hal-02023389,
  title = {{An Interoperable Notification Service for Pervasive Computing}},
  author = {Roth, Felix Maximilian and Pfannemueller, Martin and Becker, Christian and Lalanda, Philippe},
  booktitle = {{2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)}},
  hal_version = {v1},
  hal_id = {hal-02023389},
  month = {March},
  year = {2018},
  pages = {842-847},
  publisher = {{IEEE}},
  address = {Athens, France},
  url = {https://hal.science/hal-02023389},
  abstract = {},
}
- Proceedings PEAR: Prototyping Expressive Animated Robots - A framework for social robot prototyping
 E. Balit, D. Vaufreydaz and P. Reignier
 HUCAPP 2018 - 2nd International Conference on Human Computer Interaction Theory and Applications, pp. 1, Funchal, Madeira, Portugal, January 2018
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{balit:hal-01698493,
  title = {{PEAR: Prototyping Expressive Animated Robots - A framework for social robot prototyping}},
  author = {Balit, Etienne and Vaufreydaz, Dominique and Reignier, Patrick},
  booktitle = {{HUCAPP 2018 - 2nd International Conference on Human Computer Interaction Theory and Applications}},
  hal_version = {v1},
  hal_id = {hal-01698493},
  pdf = {https://inria.hal.science/hal-01698493v1/file/HUCAPP.pdf},
  keywords = {Robot Animation ; Robot Prototyping Tool ; Social Robot ; Expressive Robot ; Animation Software ; Blender},
  month = {January},
  year = {2018},
  pages = {1},
  address = {Funchal, Madeira, Portugal},
  url = {https://inria.hal.science/hal-01698493},
  abstract = {Social robots are transitioning from lab experiments to commercial products, creating new needs for proto-typing and design tools. In this paper, we present a framework to facilitate the prototyping of expressive animated robots. For this, we start by reviewing the design of existing social robots in order to define a set of basic components of social robots. We then show how to extend an existing 3D animation software to enable the animation of these components. By composing those basic components, robots of various morphologies can be prototyped and animated. We show the capabilities of the presented framework through 2 case studies.},
}Social robots are transitioning from lab experiments to commercial products, creating new needs for proto-typing and design tools. In this paper, we present a framework to facilitate the prototyping of expressive animated robots. For this, we start by reviewing the design of existing social robots in order to define a set of basic components of social robots. We then show how to extend an existing 3D animation software to enable the animation of these components. By composing those basic components, robots of various morphologies can be prototyped and animated. We show the capabilities of the presented framework through 2 case studies. 
- Proceedings From Map to Sky: An Empirical Study on Visual Strategies of Expert Pilots
 R. Balzarini and F. Jambon
 3rd International Workshop on Eye Tracking for Spatial Research, Zurich, Switzerland, January 2018
  DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{balzarini:hal-02055475,
  title = {{From Map to Sky: An Empirical Study on Visual Strategies of Expert Pilots}},
  author = {Balzarini, Raffaella and Jambon, Francis},
  booktitle = {{3rd International Workshop on Eye Tracking for Spatial Research}},
  hal_version = {v1},
  hal_id = {hal-02055475},
  keywords = {aeronautical charts ; eye-tracking ; visual attention ; matching ; cartographic and mental representation ; mental simulation},
  doi = {10.3929/ethz-b-000222485},
  month = {January},
  year = {2018},
  address = {Zurich, Switzerland},
  url = {https://hal.science/hal-02055475},
  abstract = {For pilots, making the right choice of waypoints on an aeronautical chart for navigation is a complex cognitive task. This task involves the pilot's ability to match the geographical objects depicted on the maps with the mental representation of the corresponding physical objects, actually present in the natural environment. Our objective is to investigate the mental strategies, which are underlying expert pilots’ waypoints selection and to model them for training. This article presents a first modeling step, based on an empirical study using eye-tracking methodologies, combined with other experimental techniques.},
}For pilots, making the right choice of waypoints on an aeronautical chart for navigation is a complex cognitive task. This task involves the pilot's ability to match the geographical objects depicted on the maps with the mental representation of the corresponding physical objects, actually present in the natural environment. Our objective is to investigate the mental strategies, which are underlying expert pilots’ waypoints selection and to model them for training. This article presents a first modeling step, based on an empirical study using eye-tracking methodologies, combined with other experimental techniques. 
- Proceedings Pedestrian detection and behaviors modelling in Urban environment 
 D. Vaufreydaz
 SMIV 2017 - Smart Mobility and Intelligent Vehicles, Versailles, France, November 2017
  HAL[BibTeX] HAL[BibTeX]@inproceedings{vaufreydaz:hal-01766977,
  title = {{Pedestrian detection and behaviors modelling in Urban environment }},
  author = {Vaufreydaz, Dominique},
  booktitle = {{SMIV 2017 - Smart Mobility and Intelligent Vehicles}},
  hal_version = {v1},
  hal_id = {hal-01766977},
  month = {November},
  year = {2017},
  address = {Versailles, France},
  url = {https://inria.hal.science/hal-01766977},
  abstract = {},
}
- Proceedings Figurines, a multimodal framework for tangible storytelling
 M. Portaz, M. Garcia, A. Barbulescu, A. Begault, L. Boissieux, M. Cani, R. Ronfard and D. Vaufreydaz
 WOCCI 2017 - 6th Workshop on Child Computer Interaction at ICMI 2017 - 19th ACM International Conference on Multi-modal Interaction, pp. 52-57, Glasgow, United Kingdom, November 2017
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{portaz:hal-01595775,
  title = {{Figurines, a multimodal framework for tangible storytelling}},
  author = {Portaz, Maxime and Garcia, Maxime and Barbulescu, Adela and Begault, Antoine and Boissieux, Laurence and Cani, Marie-Paule and Ronfard, R{\'e}mi and Vaufreydaz, Dominique},
  booktitle = {{WOCCI 2017 - 6th Workshop on Child Computer Interaction at ICMI 2017 - 19th ACM International Conference on Multi-modal Interaction}},
  hal_version = {v2},
  hal_id = {hal-01595775},
  pdf = {https://inria.hal.science/hal-01595775v2/file/Figurines.pdf},
  keywords = {Puppetry ; Storytelling ; Multimodal data fusion ; IMU sensor ; RGB-D sensor},
  doi = {10.21437/WOCCI.2017-9},
  month = {November},
  year = {2017},
  pages = {52-57},
  address = {Glasgow, United Kingdom},
  note = {Author version},
  url = {https://inria.hal.science/hal-01595775},
  abstract = {This paper presents Figurines, an offline framework for narrative creation with tangible objects, designed to record storytelling sessions with children, teenagers or adults. This framework uses tangible diegetic objects to record a free narrative from up to two storytellers and construct a fully annotated representation of the story. This representation is composed of the 3D position and orientation of the figurines, the position of decor elements and interpretation of the storytellers' actions (facial expression, gestures and voice). While maintaining the playful dimension of the storytelling session, the system must tackle the challenge of recovering the free-form motion of the figurines and the storytellers in uncontrolled environments. To do so, we record the storytelling session using a hybrid setup with two RGB-D sensors and figurines augmented with IMU sensors. The first RGB-D sensor completes IMU information in order to identify figurines and tracks them as well as decor elements. It also tracks the storytellers jointly with the second RGB-D sensor. The framework has been used to record preliminary experiments to validate interest of our approach. These experiments evaluate figurine following and combination of motion and storyteller's voice, gesture and facial expressions. In a make-believe game, this story representation was re-targeted on virtual characters to produce an animated version of the story. The final goal of the Figurines framework is to enhance our understanding of the creative processes at work during immersive storytelling.},
}This paper presents Figurines, an offline framework for narrative creation with tangible objects, designed to record storytelling sessions with children, teenagers or adults. This framework uses tangible diegetic objects to record a free narrative from up to two storytellers and construct a fully annotated representation of the story. This representation is composed of the 3D position and orientation of the figurines, the position of decor elements and interpretation of the storytellers' actions (facial expression, gestures and voice). While maintaining the playful dimension of the storytelling session, the system must tackle the challenge of recovering the free-form motion of the figurines and the storytellers in uncontrolled environments. To do so, we record the storytelling session using a hybrid setup with two RGB-D sensors and figurines augmented with IMU sensors. The first RGB-D sensor completes IMU information in order to identify figurines and tracks them as well as decor elements. It also tracks the storytellers jointly with the second RGB-D sensor. The framework has been used to record preliminary experiments to validate interest of our approach. These experiments evaluate figurine following and combination of motion and storyteller's voice, gesture and facial expressions. In a make-believe game, this story representation was re-targeted on virtual characters to produce an animated version of the story. The final goal of the Figurines framework is to enhance our understanding of the creative processes at work during immersive storytelling. 
- Proceedings Autonomic management of context data based on application requirements
 J. Mertz, V. Zapalowski, P. Lalanda and I. Nunes
 IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society, pp. 8622-8627, Beijing, France, October 2017
  HAL[BibTeX] HAL[BibTeX]@inproceedings{mertz:hal-02023401,
  title = {{Autonomic management of context data based on application requirements}},
  author = {Mertz, Jhonny and Zapalowski, Vanius and Lalanda, Philippe and Nunes, Ingrid},
  booktitle = {{IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society}},
  hal_version = {v1},
  hal_id = {hal-02023401},
  month = {October},
  year = {2017},
  pages = {8622-8627},
  publisher = {{IEEE}},
  address = {Beijing, France},
  url = {https://hal.science/hal-02023401},
  abstract = {},
}
- Proceedings Multimodal Observation and Interpretation of Subjects Engaged in Problem Solving
 T. Guntz, R. Balzarini, D. Vaufreydaz and J. L. Crowley
 1st Workshop on ``Behavior, Emotion and Representation: Building Blocks of Interaction'', Bielefeld, Germany, October 2017
  PDF PDF HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{guntz:hal-01615461,
  title = {{Multimodal Observation and Interpretation of Subjects Engaged in Problem Solving}},
  author = {Guntz, Thomas and Balzarini, Raffaella and Vaufreydaz, Dominique and Crowley, James L.},
  booktitle = {{1st Workshop on ``Behavior, Emotion and Representation: Building Blocks of Interaction''}},
  hal_version = {v1},
  hal_id = {hal-01615461},
  pdf = {https://inria.hal.science/hal-01615461v1/file/main.pdf},
  keywords = {Affective Computing ; Chess Problem Solving ; Multimodal Perception ; Eye Tracking},
  month = {October},
  year = {2017},
  address = {Bielefeld, Germany},
  url = {https://inria.hal.science/hal-01615461},
  abstract = {In this paper we present the first results of a pilot experiment in the capture and interpretation of multimodal signals of human experts engaged in solving challenging chess problems. Our goal is to investigate the extent to which observations of eye-gaze, posture, emotion and other physiological signals can be used to model the cognitive state of subjects, and to explore the integration of multiple sensor modalities to improve the reliability of detection of human displays of awareness and emotion. We observed chess players engaged in problems of increasing difficulty while recording their behavior. Such recordings can be used to estimate a participant's awareness of the current situation and to predict ability to respond effectively to challenging situations. Results show that a multimodal approach is more accurate than a unimodal one. By combining body posture, visual attention and emotion, the multimodal approach can reach up to 93% of accuracy when determining player's chess expertise while unimodal approach reaches 86%. Finally this experiment validates the use of our equipment as a general and reproducible tool for the study of participants engaged in screen-based interaction and/or problem solving.},
}In this paper we present the first results of a pilot experiment in the capture and interpretation of multimodal signals of human experts engaged in solving challenging chess problems. Our goal is to investigate the extent to which observations of eye-gaze, posture, emotion and other physiological signals can be used to model the cognitive state of subjects, and to explore the integration of multiple sensor modalities to improve the reliability of detection of human displays of awareness and emotion. We observed chess players engaged in problems of increasing difficulty while recording their behavior. Such recordings can be used to estimate a participant's awareness of the current situation and to predict ability to respond effectively to challenging situations. Results show that a multimodal approach is more accurate than a unimodal one. By combining body posture, visual attention and emotion, the multimodal approach can reach up to 93% of accuracy when determining player's chess expertise while unimodal approach reaches 86%. Finally this experiment validates the use of our equipment as a general and reproducible tool for the study of participants engaged in screen-based interaction and/or problem solving. 
- Proceedings Natural Vision Based Method for Predicting Pedestrian Behaviour in Urban Environments
 P. Vasishta, D. Vaufreydaz and A. Spalanzani
 IEEE 20th International Conference on Intelligent Transportation Systems, Yokohama, Japan, October 2017
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{vasishta:hal-01561029,
  title = {{Natural Vision Based Method for Predicting Pedestrian Behaviour in Urban Environments}},
  author = {Vasishta, Pavan and Vaufreydaz, Dominique and Spalanzani, Anne},
  booktitle = {{IEEE 20th International Conference on Intelligent Transportation Systems}},
  hal_version = {v1},
  hal_id = {hal-01561029},
  pdf = {https://inria.hal.science/hal-01561029v1/file/main.pdf},
  keywords = {Pedestrian Behaviour ; Natural Vision ; Potential Fields},
  doi = {10.1109/ITSC.2017.8317848},
  month = {October},
  year = {2017},
  address = {Yokohama, Japan},
  url = {https://inria.hal.science/hal-01561029},
  abstract = {This paper proposes to model pedestrian behaviour in urban scenes by combining the principles of urban planning and the sociological concept of Natural Vision. This model assumes that the environment perceived by pedestrians is composed of multiple potential fields that influence their behaviour. These fields are derived from static scene elements like side-walks, cross-walks, buildings, shops entrances and dynamic obstacles like cars and buses for instance. Using this model, autonomous cars increase their level of situational awareness in the local urban space, with the ability to infer probable pedestrian paths in the scene to predict, for example, legal and illegal crossings.},
}This paper proposes to model pedestrian behaviour in urban scenes by combining the principles of urban planning and the sociological concept of Natural Vision. This model assumes that the environment perceived by pedestrians is composed of multiple potential fields that influence their behaviour. These fields are derived from static scene elements like side-walks, cross-walks, buildings, shops entrances and dynamic obstacles like cars and buses for instance. Using this model, autonomous cars increase their level of situational awareness in the local urban space, with the ability to infer probable pedestrian paths in the scene to predict, for example, legal and illegal crossings. 
- Proceedings Urban Pedestrian Behaviour Modelling using Natural Vision andPotential Fields
 P. Vasishta, D. Vaufreydaz and A. Spalanzani
 9th Workshop on Planning, Perception and Navigation for Intelligent Vehicles at the IEEE International Conference on Intelligent Robots and Systems, Vancouver, Canada, September 2017
  HAL[BibTeX] HAL[BibTeX]@inproceedings{vasishta:hal-01578741,
  title = {{Urban Pedestrian Behaviour Modelling using Natural Vision andPotential Fields}},
  author = {Vasishta, Pavan and Vaufreydaz, Dominique and Spalanzani, Anne},
  booktitle = {{9th Workshop on Planning, Perception and Navigation for Intelligent Vehicles at the IEEE International Conference on Intelligent Robots and Systems}},
  hal_version = {v1},
  hal_id = {hal-01578741},
  month = {September},
  year = {2017},
  address = {Vancouver, Canada},
  url = {https://inria.hal.science/hal-01578741},
  abstract = {},
}
- Journal Routines and informal situations in children's daily lives
 S. Depeau, S. Chardonnel, I. I. André-Poyaud, A. Lepetit, J. Francis, E. Quesseveur, G. Jérôme, A. Théodora and C. Choquet
 Travel Behaviour and Society, vol. 9, pp. 70-80, September 2017
  DOI DOI HAL[BibTeX] HAL[BibTeX]@article{depeau:halshs-01589460,
  title = {{Routines and informal situations in children's daily lives}},
  author = {Depeau, Sandrine and Chardonnel, Sonia and Andr{\'e}-Poyaud, Isabelle I. and Lepetit, Arnaud and Francis, Jambon and Quesseveur, Erwan and J{\'e}r{\^o}me, Gombaud and Th{\'e}odora, Allard and Choquet, Charles-Antoine},
  journal = {{Travel Behaviour and Society}},
  hal_version = {v1},
  hal_id = {halshs-01589460},
  keywords = {mobility ; children},
  doi = {10.1016/j.tbs.2017.06.003},
  month = {September},
  year = {2017},
  pages = {70-80},
  volume = {9},
  publisher = {{Elsevier}},
  url = {https://shs.hal.science/halshs-01589460},
  abstract = {},
}
- Proceedings An Approach for the Analysis of Perceptual and Gestural Performance During Critical Situations
 Y. Bourrier, J. Francis, C. Garbay and V. Luengo
 EC-TEL 2017, pp. 373--378, Tallinn, Estonia, September 2017
  DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{bourrier:hal-01578390,
  title = {{An Approach for the Analysis of Perceptual and Gestural Performance During Critical Situations}},
  author = {Bourrier, Yannick and Francis, Jambon and Garbay, Catherine and Luengo, Vanda},
  booktitle = {{EC-TEL 2017}},
  hal_version = {v1},
  hal_id = {hal-01578390},
  keywords = {Neural networks ; Critical situations ; Non-technical skills ; Ill-defined domains},
  doi = {10.1007/978-3-319-66610-5\_29},
  month = {September},
  year = {2017},
  pages = {373--378},
  publisher = {{Springer International Publishing}},
  address = {Tallinn, Estonia},
  url = {https://hal.science/hal-01578390},
  abstract = {Our objective is the design of a Virtual Learning Environment to train a person performing a work activity, to acquire non-technical skills during the experience of a critical situation. While the person’s performance level is due to carefully acquired technical skills, how it is maintained in front of criticality depends on non-technical skills, such as decision-making, situation awareness or stress management. Following previous break downs of the domains ill-defined aspects, we focus in this paper on the design of an approach to evaluate the variation of a learner’s performance in front of learning situations showing varying degrees of criticality, in the domains of driving and midwifery.},
}Our objective is the design of a Virtual Learning Environment to train a person performing a work activity, to acquire non-technical skills during the experience of a critical situation. While the person’s performance level is due to carefully acquired technical skills, how it is maintained in front of criticality depends on non-technical skills, such as decision-making, situation awareness or stress management. Following previous break downs of the domains ill-defined aspects, we focus in this paper on the design of an approach to evaluate the variation of a learner’s performance in front of learning situations showing varying degrees of criticality, in the domains of driving and midwifery. 
- Proceedings Self-Aware Context in Smart Home Pervasive Platforms
 P. Lalanda, G. Eva and S. Chollet
 14th IEEE International Conference on Autonomic Computing (ICAC 2017), Columbus, OH, United States, July 2017
  HAL[BibTeX] HAL[BibTeX]@inproceedings{lalanda:hal-01674695,
  title = {{Self-Aware Context in Smart Home Pervasive Platforms}},
  author = {Lalanda, Philippe and Eva, Gerbert-Gaillard and Chollet, St{\'e}phanie},
  booktitle = {{14th IEEE International Conference on Autonomic Computing (ICAC 2017)}},
  hal_version = {v1},
  hal_id = {hal-01674695},
  month = {July},
  year = {2017},
  address = { Columbus, OH, United States},
  url = {https://hal.science/hal-01674695},
  abstract = {},
}
- Proceedings Resource-Oriented Framework for Representing Pervasive Context
 P. Lalanda and C. Escoffier
 2017 IEEE International Congress on Internet of Things (ICIOT), pp. 155-158, Honolulu, France, June 2017
  HAL[BibTeX] HAL[BibTeX]@inproceedings{lalanda:hal-02023414,
  title = {{Resource-Oriented Framework for Representing Pervasive Context}},
  author = {Lalanda, Philippe and Escoffier, Clement},
  booktitle = {{2017 IEEE International Congress on Internet of Things (ICIOT)}},
  hal_version = {v1},
  hal_id = {hal-02023414},
  month = {June},
  year = {2017},
  pages = {155-158},
  publisher = {{IEEE}},
  address = {Honolulu, France},
  url = {https://hal.science/hal-02023414},
  abstract = {},
}
- Journal Analyse de connaissances perceptivo-gestuelles dans un Système Tutoriel Intelligent
 B. Toussaint, V. Luengo and J. Francis
 STICEF (Sciences et Technologies de l'Information et de la Communication pour l'Éducation et la Formation), June 2017
  DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{toussaint:hal-01517119,
  title = {{Analyse de connaissances perceptivo-gestuelles dans un Syst{\`e}me Tutoriel Intelligent}},
  author = {Toussaint, Ben-Manson and Luengo, Vanda and Francis, Jambon},
  journal = {{STICEF (Sciences et Technologies de l'Information et de la Communication pour l'{\'E}ducation et la Formation)}},
  hal_version = {v1},
  hal_id = {hal-01517119},
  keywords = {TELEOS ; Intelligent Tutoring System ; percutaneous orthopedic surgery ; Perceptual-gestural knowledge ; Syst{\`e}me Tutoriel Intelligent ; chirurgie orthop{\'e}dique percutan{\'e}e ; $\bullet$ Connaissances perceptivo-gestuelles},
  doi = {10.3406/stice.2017.1732},
  month = {June},
  year = {2017},
  publisher = {{ATIEF}},
  url = {https://hal.science/hal-01517119},
  abstract = {To cover the aspects of multimodal knowledge such as perceptual-gestural knowledge, various devices are required. Traces produced by these devices provide rich information and often accurate on learners’ activity. However, those traces are multi-source and heterogeneous and, thus, difficult to process automatically. To foster their treatment, a formal representation that reflects consistent multimodal activity to which they are linked, is needed. This paper describes our proposal to formalize this type of traces recorded from TELEOS, an Intelligent Tutoring System dedicated to percutaneous orthopedic surgery.},
}To cover the aspects of multimodal knowledge such as perceptual-gestural knowledge, various devices are required. Traces produced by these devices provide rich information and often accurate on learners’ activity. However, those traces are multi-source and heterogeneous and, thus, difficult to process automatically. To foster their treatment, a formal representation that reflects consistent multimodal activity to which they are linked, is needed. This paper describes our proposal to formalize this type of traces recorded from TELEOS, an Intelligent Tutoring System dedicated to percutaneous orthopedic surgery. 
- Proceedings A multi-layered architecture for analysis of non-technical-skills in critical situations
 Y. Bourrier, F. Jambon, C. Garbay and V. Luengo
 AIED 2017, vol. 10331, pp. 463--466, Wuhan, China, June 2017
  DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{bourrier:hal-01517152,
  title = {{A multi-layered architecture for analysis of non-technical-skills in critical situations}},
  author = {Bourrier, Yannick and Jambon, Francis and Garbay, Catherine and Luengo, Vanda},
  booktitle = {{AIED 2017}},
  hal_version = {v1},
  hal_id = {hal-01517152},
  keywords = {non-technical skills ; ill-defined domains ; critical situations ; neural networks},
  doi = {10.1007/978-3-319-61425-0\_41},
  month = {June},
  year = {2017},
  pages = {463--466},
  volume = {10331},
  series = {AIED 2017},
  publisher = {{Springer International Publishing}},
  address = {Wuhan, China},
  url = {https://hal.science/hal-01517152},
  abstract = {In most technical domains, it is a worker’s technical expertise which determines how they assess and respond to situations. However, their performance is also influenced by meta-cognitive abilities, such as situation awareness and decision-making or personal resources skills such as stress and fatigue management. These expertise are commonly described as non-technical skills. Studies have shown that while these skills almost always complement technical activity, they are most influential during critical situations, where usual technical procedures cannot be successfully applied. The MacCoy-Critical project will Intelligent Learning Environment, able to diagnose a learner’s non-technical skills in critical situations inside of a virtual environment, in the domains of driving and delivery handling by midwives. This diagnosis should in turn allow the architecture to generate adapted immediate feedback, as well as providing new learning critical situations adapted to the learner’s skills. As part of the project, this article focuses on the challenges raised by the diagnosis of non-technical skills inside a virtual environment. We propose a general architecture which aims to extract information concerning the influence of non-technical skills from learners’ activity, assuming the technical skills are already acquired. This article presents the conceptual un-derpinnings behind the proposed architecture.},
}In most technical domains, it is a worker’s technical expertise which determines how they assess and respond to situations. However, their performance is also influenced by meta-cognitive abilities, such as situation awareness and decision-making or personal resources skills such as stress and fatigue management. These expertise are commonly described as non-technical skills. Studies have shown that while these skills almost always complement technical activity, they are most influential during critical situations, where usual technical procedures cannot be successfully applied. The MacCoy-Critical project will Intelligent Learning Environment, able to diagnose a learner’s non-technical skills in critical situations inside of a virtual environment, in the domains of driving and delivery handling by midwives. This diagnosis should in turn allow the architecture to generate adapted immediate feedback, as well as providing new learning critical situations adapted to the learner’s skills. As part of the project, this article focuses on the challenges raised by the diagnosis of non-technical skills inside a virtual environment. We propose a general architecture which aims to extract information concerning the influence of non-technical skills from learners’ activity, assuming the technical skills are already acquired. This article presents the conceptual un-derpinnings behind the proposed architecture. 
- Proceedings Conflict Management in Service-Oriented Pervasive Platforms
 P. Lalanda, R. B. Hadj, C. Hamon and G. Vega
 2017 IEEE International Conference on Services Computing (SCC), pp. 249-256, Honolulu, France, June 2017
  HAL[BibTeX] HAL[BibTeX]@inproceedings{lalanda:hal-02023427,
  title = {{Conflict Management in Service-Oriented Pervasive Platforms}},
  author = {Lalanda, Philippe and Hadj, Rania Ben and Hamon, Catherine and Vega, German},
  booktitle = {{2017 IEEE International Conference on Services Computing (SCC)}},
  hal_version = {v1},
  hal_id = {hal-02023427},
  month = {June},
  year = {2017},
  pages = {249-256},
  publisher = {{IEEE}},
  address = {Honolulu, France},
  url = {https://hal.science/hal-02023427},
  abstract = {},
}
- Proceedings Making Movies from Make-Believe Games
 A. Barbulescu, A. Begault, L. Boissieux, M. Cani, M. Garcia, M. Portaz, A. Viand, P. Heinish, R. Dulery, R. Ronfard and D. Vaufreydaz
 WICED 2017 - 6th Workshop on Intelligent Cinematography and Editing (WICED 2017), Lyon, France, April 2017
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{barbulescu:hal-01518981,
  title = {{Making Movies from Make-Believe Games}},
  author = {Barbulescu, Adela and Begault, Antoine and Boissieux, Laurence and Cani, Marie-Paule and Garcia, Maxime and Portaz, Maxime and Viand, Alexis and Heinish, Pierre and Dulery, Romain and Ronfard, R{\'e}mi and Vaufreydaz, Dominique},
  booktitle = {{WICED 2017 - 6th Workshop on Intelligent Cinematography and Editing (WICED 2017)}},
  hal_version = {v2},
  hal_id = {hal-01518981},
  pdf = {https://inria.hal.science/hal-01518981v2/file/make-movies-public.pdf},
  keywords = {Interaction styles ; User Interfaces ; Three-Dimensional Graphics and Realism ; Animation},
  doi = {10.2312/wiced.20171074},
  month = {April},
  year = {2017},
  publisher = {{The Eurographics Association}},
  address = {Lyon, France},
  url = {https://inria.hal.science/hal-01518981},
  abstract = {Pretend play is a storytelling technique, naturally used from very young ages, which relies on object substitution to represent the characters of the imagined story. We propose "Make-believe", a system for making movies from pretend play by using 3D printed figurines as props. We capture the rigid motions of the figurines and the gestures and facial expressions of the storyteller using Kinect cameras and IMU sensors and transfer them to the virtual story-world. As a proof-of-concept, we demonstrate our system with an improvised story involving a prince and a witch, which was successfully recorded and transferred into 3D animation.},
}Pretend play is a storytelling technique, naturally used from very young ages, which relies on object substitution to represent the characters of the imagined story. We propose "Make-believe", a system for making movies from pretend play by using 3D printed figurines as props. We capture the rigid motions of the figurines and the gestures and facial expressions of the storyteller using Kinect cameras and IMU sensors and transfer them to the virtual story-world. As a proof-of-concept, we demonstrate our system with an improvised story involving a prince and a witch, which was successfully recorded and transferred into 3D animation. 
- Proceedings A self-aware approach to context management in pervasive platforms
 G. Eva, P. Lalanda, S. Chollet and D. Jérémie
 PerCom Workshops 2017, Kona, Big Island, HI, United States, March 2017
  DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{eva:hal-01535547,
  title = {{A self-aware approach to context management in pervasive platforms}},
  author = {Eva, Gerbert-Gaillard and Lalanda, Philippe and Chollet, St{\'e}phanie and J{\'e}r{\'e}mie, Demarchez},
  booktitle = {{PerCom Workshops 2017}},
  hal_version = {v1},
  hal_id = {hal-01535547},
  keywords = { Context ;  software engineering ;  self-awareness ;  pervasive computing },
  doi = {10.1109/PERCOMW.2017.7917568},
  month = {March},
  year = {2017},
  address = {Kona, Big Island, HI, United States},
  url = {https://hal.science/hal-01535547},
  abstract = {Pervasive computing envisions environments where computers are blended into everyday objects in order to provide added-value services to people. Already today, there is a growing number of advanced embedded systems around us, extended with computing and communication capabilities. However, pervasive applications raise major challenges in term of software engineering and remain hard to develop, deploy, execute, and maintain. In particular, smart home platforms must be autonomic because their management cannot be done by end users who do not possess the necessary skills. In this paper, we propose to handle context management in a service-oriented pervasive platform by defining a self-aware solution and associated mechanisms. Our approach is illustrated with a smart home example implemented in our pervasive platform iCasa.},
}Pervasive computing envisions environments where computers are blended into everyday objects in order to provide added-value services to people. Already today, there is a growing number of advanced embedded systems around us, extended with computing and communication capabilities. However, pervasive applications raise major challenges in term of software engineering and remain hard to develop, deploy, execute, and maintain. In particular, smart home platforms must be autonomic because their management cannot be done by end users who do not possess the necessary skills. In this paper, we propose to handle context management in a service-oriented pervasive platform by defining a self-aware solution and associated mechanisms. Our approach is illustrated with a smart home example implemented in our pervasive platform iCasa. 
- Proceedings A system for creating virtual reality content from make-believe games
 A. Barbulescu, M. Garcia, A. Begault, L. Boissieux, M. Cani, M. Portaz, A. Viand, R. Dulery, P. Heinish, R. Ronfard and D. Vaufreydaz
 IEEE Virtual Reality 2017, pp. 207-208, Los Angeles, United States, March 2017
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{barbulescu:hal-01578326,
  title = {{A system for creating virtual reality content from make-believe games}},
  author = {Barbulescu, Adela and Garcia, Maxime and Begault, Antoine and Boissieux, Laurence and Cani, Marie-Paule and Portaz, Maxime and Viand, Alexis and Dulery, Romain and Heinish, Pierre and Ronfard, R{\'e}mi and Vaufreydaz, Dominique},
  booktitle = {{IEEE Virtual Reality 2017}},
  hal_version = {v1},
  hal_id = {hal-01578326},
  pdf = {https://inria.hal.science/hal-01578326v1/file/CreatingVRContentFromMake-BelieveGames.pdf},
  keywords = {IMU sensors ; 3D printed figurines ; 3D animation ; Speech ; facial expressions ; Kinect cameras ; make-believe games ; storyteller ; virtual reality content creation ; virtualized story ; Cameras ; Head ; Magnetic heads ; computer animation ; computer games ; virtual reality ; Sensors},
  doi = {10.1109/VR.2017.7892249},
  month = {March},
  year = {2017},
  pages = {207-208},
  publisher = {{IEEE}},
  address = {Los Angeles, United States},
  url = {https://inria.hal.science/hal-01578326},
  abstract = {Pretend play is a storytelling technique, naturally used from very young ages, which relies on object substitution to represent the characters of the imagined story. We propose a system which assists the storyteller by generating a virtualized story from a recorded dialogue performed with 3D printed figurines. We capture the gestures and facial expressions of the storyteller using Kinect cameras and IMU sensors and transfer them to their virtual counterparts in the story-world. As a proof-of-concept, we demonstrate our system with an improvised story involving a prince and a witch, which was successfully recorded and transferred into 3D animation.},
}Pretend play is a storytelling technique, naturally used from very young ages, which relies on object substitution to represent the characters of the imagined story. We propose a system which assists the storyteller by generating a virtualized story from a recorded dialogue performed with 3D printed figurines. We capture the gestures and facial expressions of the storyteller using Kinect cameras and IMU sensors and transfer them to their virtual counterparts in the story-world. As a proof-of-concept, we demonstrate our system with an improvised story involving a prince and a witch, which was successfully recorded and transferred into 3D animation. 
- Proceedings Context-based conflict management in pervasive platforms
 R. Ben Hadj, C. Hamon, S. Chollet, G. E. Vega Baez and P. Lalanda
 2017 IEEE International Conference on Pervasive Computing and Communications Workshops, PerCom Workshops 2017, Kona, Big Island, HI, United States, March 2017
  HAL[BibTeX] HAL[BibTeX]@inproceedings{benhadj:hal-01898642,
  title = {{Context-based conflict management in pervasive platforms}},
  author = {Ben Hadj, Rania and Hamon, Catherine and Chollet, St{\'e}phanie and Vega Baez, German Eduardo and Lalanda, Philippe},
  booktitle = {{2017 IEEE International Conference on Pervasive Computing and Communications Workshops, PerCom Workshops 2017}},
  hal_version = {v1},
  hal_id = {hal-01898642},
  month = {March},
  year = {2017},
  address = {Kona, Big Island, HI, United States},
  url = {https://hal.science/hal-01898642},
  abstract = {},
}
- Proceedings Device installation in smart homes
 C. Hamon, V. Lestideau, G. Vega and P. Lalanda
 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), pp. 74-75, Kona, France, March 2017
  HAL[BibTeX] HAL[BibTeX]@inproceedings{hamon:hal-02014699,
  title = {{Device installation in smart homes}},
  author = {Hamon, Catherine and Lestideau, Vincent and Vega, German and Lalanda, Philippe},
  booktitle = {{2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)}},
  hal_version = {v1},
  hal_id = {hal-02014699},
  month = {March},
  year = {2017},
  pages = {74-75},
  publisher = {{IEEE}},
  address = {Kona, France},
  url = {https://hal.science/hal-02014699},
  abstract = {},
}
- Journal Autonomic Mediation Middleware for Smart Manufacturing
 P. Lalanda, D. Morand and S. Chollet
 IEEE Internet Computing, January 2017
  DOI DOI HAL[BibTeX] HAL[BibTeX]@article{lalanda:hal-01898625,
  title = {{Autonomic Mediation Middleware for Smart Manufacturing}},
  author = {Lalanda, Philippe and Morand, Denis and Chollet, St{\'e}phanie},
  journal = {{IEEE Internet Computing}},
  hal_version = {v1},
  hal_id = {hal-01898625},
  doi = {10.1109/mic.2017.18},
  month = {January},
  year = {2017},
  publisher = {{Institute of Electrical and Electronics Engineers}},
  url = {https://hal.science/hal-01898625},
  abstract = {},
}
- Proceedings Modèles de raisonnement pour le diagnostic et le feedback dans l'apprentissage de la gestion des compétences non techniques en situation critique
 Y. Bourrier, F. Jambon, C. Garbay and V. Luengo
 Réalités mixtes, virtuelles et augmentées pour l'apprentissage : perspectives et challenges pour la conception, l'évaluation et le suivi (Atelier des ORPHEE-RDV 2017), Font-Romeu, France, January 2017
  HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@inproceedings{bourrier:hal-02053483,
  title = {{Mod{\`e}les de raisonnement pour le diagnostic et le feedback dans l'apprentissage de la gestion des comp{\'e}tences non techniques en situation critique}},
  author = {Bourrier, Yannick and Jambon, Francis and Garbay, Catherine and Luengo, Vanda},
  booktitle = {{R{\'e}alit{\'e}s mixtes, virtuelles et augment{\'e}es pour l'apprentissage : perspectives et challenges pour la conception, l'{\'e}valuation et le suivi (Atelier des ORPHEE-RDV 2017)}},
  hal_version = {v1},
  hal_id = {hal-02053483},
  month = {January},
  year = {2017},
  address = {Font-Romeu, France},
  url = {https://hal.univ-grenoble-alpes.fr/hal-02053483},
  abstract = {Le projet ANR MacCoy (ANR-14-CE24-0021) s’intéresse aux verrous scientifiques posés par l’apprentissage de compétences non techniques en situation techniques à l’intérieur de mondes virtuels, dans les domaines de la médecine et de la conduite. Au sein du projet, nos contributions se focalisent sur le diagnostic des connaissances de l’apprenant à partir des traces générées par son activité à l’intérieur du simulateur, et à la prise de décision pour la génération d’un feedback pouvant être immédiat (réponse en temps réel à une action de l’apprenant), ou différé (sous la forme d’une consigne pour la génération d’un nouvelle situation critique).},
}Le projet ANR MacCoy (ANR-14-CE24-0021) s’intéresse aux verrous scientifiques posés par l’apprentissage de compétences non techniques en situation techniques à l’intérieur de mondes virtuels, dans les domaines de la médecine et de la conduite. Au sein du projet, nos contributions se focalisent sur le diagnostic des connaissances de l’apprenant à partir des traces générées par son activité à l’intérieur du simulateur, et à la prise de décision pour la génération d’un feedback pouvant être immédiat (réponse en temps réel à une action de l’apprenant), ou différé (sous la forme d’une consigne pour la génération d’un nouvelle situation critique). 
- Journal The Smartphone-Based Offline Indoor Location Competition at IPIN 2016: Analysis and Future Work
 J. Torres-Sospedra, A. Jiménez, S. Knauth, A. Moreira, Y. K. Beer, T. Fetzer, V. Ta, R. M. Montoliu, F. Seco, G. M. Mendoza-Silva, O. Belmonte, A. Koukofikis, M. J. Nicolau, A. Costa, F. M. Meneses, F. Ebner, F. Deinzer, D. Vaufreydaz, T. Dao and E. Castelli
 Sensors, vol. 557, pp. 17, 2017
  PDF PDF DOI DOI HAL[BibTeX][Abstract] HAL[BibTeX][Abstract]@article{torressospedra:hal-01490744,
  title = {{The Smartphone-Based Offline Indoor Location Competition at IPIN 2016: Analysis and Future Work}},
  author = {Torres-Sospedra, Joaqu{\'i}n and Jim{\'e}nez, Antonio and Knauth, Stefan and Moreira, Adriano and Beer, Yair K and Fetzer, Toni and Ta, Viet-Cuong and Montoliu, Raul M and Seco, Fernando and Mendoza-Silva, Germ{\'a}n M and Belmonte, Oscar and Koukofikis, Athanasios and Nicolau, Maria Jo{\~a}o and Costa, Ant{\'o}nio and Meneses, Filipe M and Ebner, Frank and Deinzer, Frank and Vaufreydaz, Dominique and Dao, Trung-Kien and Castelli, Eric},
  journal = {{Sensors}},
  hal_version = {v1},
  hal_id = {hal-01490744},
  pdf = {https://inria.hal.science/hal-01490744v1/file/sensors-17-00557.pdf},
  keywords = {indoor localization technology ; indoor navigation ; smartphone applications ; evaluation and benchmarking},
  doi = {10.3390/s17030557},
  year = {2017},
  pages = {17},
  volume = {557},
  publisher = {{MDPI}},
  url = {https://inria.hal.science/hal-01490744},
  abstract = {This paper presents the analysis and discussion of the off-site localization competition track, which took place during the Seventh International Conference on Indoor Positioning and Indoor Navigation (IPIN 2016). Five international teams proposed different strategies for smartphone-based indoor positioning using the same reference data. The competitors were provided with several smartphone-collected signal datasets, some of which were used for training (known trajectories), and others for evaluating (unknown trajectories). The competition permits a coherent evaluation method of the competitors' estimations, where inside information to fine-tune their systems is not offered, and thus provides, in our opinion, a good starting point to introduce a fair comparison between the smartphone-based systems found in the literature. The methodology, experience, feedback from competitors and future working lines are described.},
}This paper presents the analysis and discussion of the off-site localization competition track, which took place during the Seventh International Conference on Indoor Positioning and Indoor Navigation (IPIN 2016). Five international teams proposed different strategies for smartphone-based indoor positioning using the same reference data. The competitors were provided with several smartphone-collected signal datasets, some of which were used for training (known trajectories), and others for evaluating (unknown trajectories). The competition permits a coherent evaluation method of the competitors' estimations, where inside information to fine-tune their systems is not offered, and thus provides, in our opinion, a good starting point to introduce a fair comparison between the smartphone-based systems found in the literature. The methodology, experience, feedback from competitors and future working lines are described. 
- incollection Leveraging Design and Runtime Architecture Models to Support Self-awareness
 P. Lalanda, S. Chollet and C. Hamon
 Self-Aware Computing Systems, 2017
  HAL[BibTeX] HAL[BibTeX]@incollection{lalanda:hal-01898636,
  title = {{Leveraging Design and Runtime Architecture Models to Support Self-awareness}},
  author = {Lalanda, Philippe and Chollet, St{\'e}phanie and Hamon, Catherine},
  booktitle = {{Self-Aware Computing Systems}},
  hal_version = {v1},
  hal_id = {hal-01898636},
  year = {2017},
  url = {https://hal.science/hal-01898636},
  abstract = {},
}