Publications

List of research publications

2026

  1. Sign-SALD: A Skeleton-Aware Latent Diffusion Model for Text-driven Sign Language Production
    Jiayu Shen, Kalin Stefanov, Lay-Ki Soon, Vee Yee Chong, and KokSheik Wong
    In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2026
  2. PhysHDR: When Lighting Meets Materials and Scene Geometry in HDR Reconstruction
    Hrishav Bakul Barua, Kalin Stefanov, Ganesh Krishnasamy, KokSheik Wong, and Abhinav Dhall
    In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2026
  3. AuslanSpell: An Interactive Technology for Improving Auslan Fingerspelling Comprehension
    Kalin Stefanov, Andre Pham, Antony Loose, Lucy Robertson-Bell, and Louisa Willoughby
    In Proceedings of the ACM Conference on Human Factors in Computing Systems, 2026
  4. DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors
    Kaustubh Kundu, Hrishav Bakul Barua, Lucy Robertson-Bell, Zhixi Cai, and Kalin Stefanov
    In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026

2025

  1. S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction
    Mohammad Adiban, Kalin Stefanov, Sabato Marco Siniscalchi, and Giampiero Salvi
    IEEE Transactions on Multimedia, 2025
  2. Sign-MExD: An Expert-Infused Diffusion Model for Sign Language Production
    Jiayu Shen, Kalin Stefanov, Vee Yee Chong, Lay-Ki Soon, and KokSheik Wong
    In Proceedings of the APSIPA Annual Summit and Conference, 2025
  3. Enhancing Tactile Learning: A Co-Designed System for Supporting Speech Interaction with Multi-Part 3D Printed Models by Students who are Blind
    Ruth Galan Nagassa, Andre Ky Pham, Matthew Butler, Leona Holloway, Kalin Stefanov, Skye Vent, and Kim Marriott
    In Proceedings of the ACM Conference on Human Factors in Computing Systems, 2025
  4. GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction
    Hrishav Bakul Barua, Kalin Stefanov, KokSheik Wong, Abhinav Dhall, and Ganesh Krishnasamy
    In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025
  5. Do Blind Spots Matter for Word-Referent Mapping? A Computational Study with Infant Egocentric Video
    Zekai Shi, Zhixi Cai, and Kalin Stefanov
    2025

2024

  1. Participation Role-Driven Engagement Estimation of ASD Individuals in Neurodiverse Group Discussions
    Kalin Stefanov, Yukiko I. Nakano, Chisa Kobayashi, Ibuki Hoshina, Tatsuya Sakato, Fumio Nihei, Chihiro Takayama, Ryo Ishii, and Masatsugu Tsujii
    In Proceedings of the ACM International Conference on Multimodal Interaction, 2024
  2. AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
    Zhixi Cai, Shreya Ghosh, Aman Pankaj Adatia, Munawar Hayat, Abhinav Dhall, Tom Gedeon, and Kalin Stefanov
    In Proceedings of the ACM ACM International Conference on Multimedia, 2024
  3. 1M-Deepfakes Detection Challenge
    Zhixi Cai, Abhinav Dhall, Shreya Ghosh, Munawar Hayat, Dimitrios Kollias, Kalin Stefanov, and Usman Tariq
    In Proceedings of the ACM ACM International Conference on Multimedia, 2024
  4. HistoHDR-Net: Histogram Equalization for Single LDR to HDR Image Translation
    Hrishav Bakul Barua, Ganesh Krishnasamy, KokSheik Wong, Abhinav Dhall, and Kalin Stefanov
    In Proceedings of the IEEE International Conference on Image Processing, 2024
  5. LLM-HDR: Bridging LLM-based Perception and Self-Supervision for Unpaired LDR-to-HDR Image Reconstruction
    Hrishav Bakul Barua, Kalin Stefanov, Lemuel Lai En Che, Abhinav Dhall, KokSheik Wong, and Ganesh Krishnasamy
    2024
  6. Human Brain Exhibits Distinct Patterns When Listening to Fake Versus Real Audio: Preliminary Evidence
    Mahsa Salehi, Kalin Stefanov, and Ehsan Shareghi
    2024

2023

  1. MARLIN: Masked Autoencoder for Facial Video Representation Learning
    Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, and Munawar Hayat
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
  2. Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization
    Zhixi Cai, Shreya Ghosh, Abhinav Dhall, Tom Gedeon, Kalin Stefanov, and Munawar Hayat
    Computer Vision and Image Understanding, 2023
  3. ArtHDR-Net: Perceptually Realistic and Accurate HDR Content Creation
    Hrishav Bakul Barua, Ganesh Krishnasamy, KokSheik Wong, Kalin Stefanov, and Abhinav Dhall
    In Proceedings of the APSIPA Annual Summit and Conference, 2023

2022

  1. Graph-Based Group Modelling for Backchannel Detection
    Garima Sharma, Kalin Stefanov, Abhinav Dhall, and Jianfei Cai
    In Proceedings of the ACM International Conference on Multimedia, 2022
  2. Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization
    Zhixi Cai, Kalin Stefanov, Abhinav Dhall, and Munawar Hayat
    In Proceedings of the International Conference on Digital Image Computing: Techniques and Applications, 2022
  3. Hierarchical Residual Learning Based Vector Quantized Variational Autoencoder for Image Reconstruction and Generation
    Mohammad Adiban, Kalin Stefanov, Sabato M. Siniscalchi, and Giampiero Salvi
    In Proceedings of the British Machine Vision Conference, 2022
  4. Visual Representations of Physiological Signals for Fake Video Detection
    Kalin Stefanov, Bhawna Paliwal, and Abhinav Dhall
    2022

2021

  1. Spatial Bias in Vision-Based Voice Activity Detection
    Kalin Stefanov, Mohammad Adiban, and Giampiero Salvi
    In Proceedings of the International Conference on Pattern Recognition, 2021
  2. Analysis of Behavior Classification in Motivational Interviewing
    Leili Tavabi, Trang Tran, Kalin Stefanov, Brian Borsari, Joshua Woolley, Stefan Scherer, and Mohammad Soleymani
    In Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access, 2021
  3. Group-Level Focus of Visual Attention for Improved Next Speaker Prediction
    Chris Birmingham, Kalin Stefanov, and Maja Mataric
    In Proceedings of the ACM International Conference on Multimedia, 2021
  4. Group-Level Focus of Visual Attention for Improved Active Speaker Detection
    Christopher Birmingham, Maja Mataric, and Kalin Stefanov
    In Proceedings of the ACM International Conference on Multimodal Interaction, 2021

2020

  1. Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially Aware Language Acquisition
    Kalin Stefanov, Jonas Beskow, and Giampiero Salvi
    IEEE Transactions on Cognitive and Developmental Systems, 2020
  2. Emotion or Expressivity? An Automated Analysis of Nonverbal Perception in a Social Dilemma
    Su Lei, Kalin Stefanov, and Jonathan Gratch
    In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, 2020
  3. Multimodal Automatic Coding of Client Behavior in Motivational Interviewing
    Leili Tavabi, Kalin Stefanov, Larry Zhang, Brian Borsari, Joshua D. Woolley, Stefan Scherer, and Mohammad Soleymani
    In Proceedings of the ACM International Conference on Multimodal Interaction, 2020
  4. OpenSense: A Platform for Multimodal Data Acquisition and Behavior Perception
    Kalin Stefanov, Baiyu Huang, Zongjian Li, and Mohammad Soleymani
    In Proceedings of the ACM International Conference on Multimodal Interaction, 2020

2019

  1. Modeling of Human Visual Attention in Multiparty Open-World Dialogues
    Kalin Stefanov, Giampiero Salvi, Dimosthenis Kontogiorgos, Hedvig Kjellström, and Jonas Beskow
    ACM Transactions on Human-Robot Interaction, 2019
  2. Towards Digitally-Mediated Sign Language Communication
    Kalin Stefanov, and Mayumi Bono
    In Proceedings of the International Conference on Human-Agent Interaction, 2019
  3. Multimodal Learning for Identifying Opportunities for Empathetic Responses
    Leili Tavabi, Kalin Stefanov, Setareh Nasihati Gilani, David Traum, and Mohammad Soleymani
    In Proceedings of the ACM International Conference on Multimodal Interaction, 2019
  4. Multimodal Analysis and Estimation of Intimate Self-Disclosure
    Mohammad Soleymani, Kalin Stefanov, Sin-Hwa Kang, Jan Ondras, and Jonathan Gratch
    In Proceedings of the ACM International Conference on Multimodal Interaction, 2019

2018

  1. Recognition and Generation of Communicative Signals: Modeling of Hand Gestures, Speech Activity and Eye-Gaze in Human-Machine Interaction
    Kalin Stefanov
    KTH Royal Institute of Technology, 2018

2017

  1. Vision-based Active Speaker Detection in Multiparty Interaction
    Kalin Stefanov, Jonas Beskow, and Giampiero Salvi
    In Proceedings of the International Workshop on Grounding Language Understanding, 2017
  2. A Real-time Gesture Recognition System for Isolated Swedish Sign Language Signs
    Kalin Stefanov, and Jonas Beskow
    In Proceedings of the European and Nordic Symposium on Multimodal Communication, 2017

2016

  1. A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction
    Kalin Stefanov, and Jonas Beskow
    In Proceedings of the International Conference on Language Resources and Evaluation, 2016
  2. Look Who’s Talking: Visual Identification of the Active Speaker in Multi-Party Human-Robot Interaction
    Kalin Stefanov, Akihiro Sugimoto, and Jonas Beskow
    In Proceedings of the International Workshop on Advancements in Social Signal Processing for Multimodal Interaction, 2016
  3. Gesture Recognition System for Isolated Sign Language Signs
    Kalin Stefanov, and Jonas Beskow
    In Proceedings of the European and Nordic Symposium on Multimodal Communication, 2016

2015

  1. Public Speaking Training with a Multimodal Interactive Virtual Audience Framework
    Mathieu Chollet, Kalin Stefanov, Helmut Prendinger, and Stefan Scherer
    In Proceedings of the ACM International Conference on Multimodal Interaction, 2015

2014

  1. Tutoring Robots
    Samer Al Moubayed, Jonas Beskow, Bajibabu Bollepalli, Ahmed Hussen-Abdelaziz, Martin Johansson, Maria Koutsombogera, José David Lopes, Jekaterina Novikova, Catharine Oertel, Gabriel Skantze, Kalin Stefanov, and Gül Varol
    In Innovative and Creative Developments in Multimodal Interaction Systems, 2014
  2. Human-Robot Collaborative Tutoring using Multiparty Multimodal Spoken Dialogue
    Samer Al Moubayed, Jonas Beskow, Bajibabu Bollepalli, Joakim Gustafson, Ahmed Hussen-Abdelaziz, Martin Johansson, Maria Koutsombogera, José David Lopes, Jekaterina Novikova, Catharine Oertel, Gabriel Skantze, Kalin Stefanov, and Gül Varol
    In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, 2014
  3. The Tutorbot Corpus — A Corpus for Studying Tutoring Behaviour in Multiparty Face-to-Face Spoken Dialogue
    Maria Koutsombogera, Samer Al Moubayed, Bajibabu Bollepalli, Ahmed Hussen Abdelaziz, Martin Johansson, José David Aguas Lopes, Jekaterina Novikova, Catharine Oertel, Kalin Stefanov, and Gül Varol
    In Proceedings of the International Conference on Language Resources and Evaluation, 2014
  4. A Data-Driven Approach to Detection of Interruptions in Human-Human Conversations
    Raveesh Meena, Saeed Dabbaghchian, and Kalin Stefanov
    In Proceedings of the FONETIK, 2014
  5. Tivoli - Learning Signs Through Games and Interaction for Children with Communicative Disorders
    Jonas Beskow, Simon Alexanderson, Kalin Stefanov, Britt Claesson, Sandra Derbring, Morgan Fredriksson, J. Starck, and E. Axelsson
    In Proceedings of the Biennial Conference of the International Society for Augmentative and Alternative Communication, 2014

2013

  1. A Kinect Corpus of Swedish Sign Language Signs
    Kalin Stefanov, and Jonas Beskow
    In Proceedings of the Workshop on Multimodal Corpora: Beyond Audio and Video, 2013
  2. The Tivoli System–A Sign-driven Game for Children with Communicative Disorders
    Jonas Beskow, Simon Alexanderson, Kalin Stefanov, Britt Claesson, Sandra Derbring, and Morgan Fredriksson
    In Proceedings of the Symposium on Multimodal Communication, 2013
  3. Web-Enabled 3D Talking Avatars Based on WebGL and HTML5
    Jonas Beskow, and Kalin Stefanov
    In Proceedings of the International Conference on Intelligent Virtual Agents, 2013

2012

  1. Multimodal Multiparty Social Interaction with the Furhat Head
    Samer Al Moubayed, Gabriel Skantze, Jonas Beskow, Kalin Stefanov, and Joakim Gustafson
    In Proceedings of the ACM International Conference on Multimodal Interaction, 2012
  2. Socially Aware Many-to-Machine Communication
    Florian Eyben, Emer Gilmartin, Cyril Joder, Erik Marchi, Christian Munier, Kalin Stefanov, Felix Weninger, and Björn Schuller
    In Proceedings of the International Summer Workshop on Multimodal Interfaces, 2012

2011

  1. D2. 4.3 Spreading Activation Components v3 - LarKC Project Deliverable
    Maurice. Grinberg, Hristo. Stefanov, Kalin Stefanov, and Ivan Peikov
    2011

2010

  1. Webcam-based Eye Gaze Tracking under Natural Head Movement
    Kalin Stefanov
    University of Amsterdam, 2010