Mingze Xu, Mingfei Gao, Yi-Ting Chen, Larry S. Davis, David J. Crandall from Indiana U, Maryland U, Honda Research Institute
The actions around us in the real world, can occur at any time, without any warnings. In order to be able to react to our world, we should make and update our inferences in real-time.
Important real-time applications require identifying actions as soon as each video frame arrives, based only on current and historical observations. And there is a novel hypothesis that, explicitly predicting the future can help to better classify actions in the present (although future information is not available right now)
3 Methods (including framework)
The overview of proposed TRN model. It consists of a series of TRNCells, just like RNNCells. And we can divide the TRN Cell into some parts:
Temporal decoder (Bottom right): To estimate the future actions and their corresponding hidden states for the next several time steps (which is a hyperparameter).
Future gate (Upper middle): To model the feature representation of future context.
SpatioTemporal Accumulator (STM) (Upper left): It takes the ht-1 and concat(xt, x’t) in, and updates its hidden states then work out a distribution over candidate actions.
4 Experiments (data corpus, evaluation metrics, compared methods)
Datasets: HDD, TVSeries, THUMOS-14
Evaluation metrics: mAP, cAP
5 Pros. & Cons.
Cons.: There are too many data streams within the TRN model. And this paper was lack of ablation study to prove that every component of TRN can contribute to its good performance.
6 Comments (e. g., improvements)
7 Closely related papers (less than 3)
- Roeland De Geest, et al., Online action detection, in ECCV, 2016.
- Roeland De Geest, et al., Modeling temporal structure with lstm for online action detection, in WACV, 2018.