Applications for Video Action Recognition and Prediction Special Session


Nino Cauli – University of Catania, Italy
Mirko Raković – University of Novi Sad, Serbia
Diego Reforgiato Recupero - University of Cagliari, Italy

Intelligent systems, able to monitor and to interact with their surroundings, are experiencing a rapid growth in the modern society. Among the various sensors available, RGB and RGB-D cameras posses the better trade-off between costs and amount of information provided. Through the use of cameras, intelligent systems are able to recognise and predict the actions of the humans around them. The ability to understand and to predict observed actions is a fundamental skill in social interaction. Body movements carry much information about the intentions and identity of the observed actors and vision is the main sensory system used by humans to recognize and to predict actions. Recognising and predicting human actions from videos are fundamental abilities in several cutting edge applications: self driving cars must be able to predict the pedestrians behaviour; video surveillance systems must detect criminal actions; collaborative and humanoid robots need to detect human's motion in shared environment; medical monitoring systems needs to check the proper execution of exercises performed by the patients; full body game controllers for virtual reality need to recognise the actions of the users.
The field of video action recognition and prediction encompass a broad set of sub-problems. In action prediction, instead of having the video of the entire action available, only an initial portion of the performed action is given as input, making the task more challenging compared to recognition.  One step further is the prediction of the expected sensory input generated by the observed action in the form of a video sequence. Another important recent research direction is video self action recognition and prediction from wearable cameras. In this case first-person videos are used to recognise the action performed by the user. This type of application has high impact in health-care and sport monitoring systems. Last but not least is the problem of scaling the existing algorithm to distributed camera systems.
Early algorithms used handcrafted features to recognise and predict actions from videos, but in the last decade, we are facing a shift to Deep Learning (DL) architectures. Researchers are actively investigating on new DL models specifically crafted to recognise actions from video sequences where the time domain plays an important role. While already exist several DL models able to extract features from single images, less of them take into account the temporal information embedded in video sequences. Recurrent Neural Networks (RNN), state of the art for speech and language processing, are starting to be used in conjunction with DL models for video processing with promising results.
DL for video action recognition and prediction is a highly relevant research topic. Several video datasets are already available, recorded in both constrained and unconstrained conditions and with RGB or RGB-D cameras. In the past years, challenges on video activities recognition are hosted in international visual processing conferences like CVPR and ECCV.
The AVARP special session will be an occasion to expose the Distributed Computing community to the problem of video action recognition and prediction applied in Intelligent Systems, promoting the exchange of new ideas and stimulating collaborations. The goal of this special session is to gather researchers working on different areas of video action recognition and prediction in order to stimulate discussions. A special emphasis will be placed on analysing the interaction between DL and video action recognition and prediction.
The AVARP special session focuses on the following topics:
  • Deep Learning for video action recognition and prediction
  • Distributed video action recognition for surveillance systems
  • Action based expected visual sensory prediction
  • Video action trajectory prediction
  • Egocentric action recognition and prediction
  • Long term action prediction
  • Deep Recurrent Neural Network for video action prediction
  • Introduction of new datasets, benchmarks and challenges for video action recognition and prediction

Submission of Papers


All accepted papers will be included in the Symposium Proceedings, which will be published by Springer as part of their series Studies in Computational Intelligence.

Full papers must be at most 12 pages long, short papers must be at most 6 pages long and poster must be at most 3 pages long and all them must be formatted according to Springer format. 

Submissions and reviews are automatically handled by EasyChair. Please submit your paper at:

Please, during the submission process specify this Special Session as topic AVARP - Applications for Video Action Recognition and Prediction in easychair.

Important Dates

May 10th, 2020 Paper submission
May 18th, 2020 Notification of acceptance 
June 5th, 2020 Final paper submission
September 21st-23rd, 2020 Symposium dates

TPC Members

  • Alexandre Bernardino, ISR, Instituto Superior Técnico, Lisbon, Portugal %to confirm
  • Naimul Khan, Ryerson University, Toronto, Canada %to confirm
  • Lorenzo Jamone, Queen Mary University of London, United Kingdom
  • Egidio Falotico, Scuola Superiore Sant'Anna, Pisa, Italy
  • Giovanni Maria Farinella, University of Catania, Italy
  • Sebastiano Battiato, University of Catania, Italy
  • Rubén Alonso, R2M Solution, Italy
  • Daniele Riboni, University of Cagliari, Italy
  • Silvio Barra, University of Cagliari, Italy