A unified approach to the recognition of complex actions from sequences of zone-crossings

Tools

Lists

Sanromà, G., Patino, L. ORCID: https://orcid.org/0000-0002-6716-0629, Burghouts, G., Schutte, K. and Ferryman, J. (2014) A unified approach to the recognition of complex actions from sequences of zone-crossings. Image and Vision Computing, 32 (5). pp. 363-378. ISSN 0262-8856

Full text not archived in this repository.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.1016/j.imavis.2014.02.005

Abstract/Summary

We present a method for the recognition of complex actions. Our method combines automatic learning of simple actions and manual definition of complex actions in a single grammar. Contrary to the general trend in complex action recognition that consists in dividing recognition into two stages, our method performs recognition of simple and complex actions in a unified way. This is performed by encoding simple action HMMs within the stochastic grammar that models complex actions. This unified approach enables a more effective influence of the higher activity layers into the recognition of simple actions which leads to a substantial improvement in the classification of complex actions. We consider the recognition of complex actions based on person transits between areas in the scene. As input, our method receives crossings of tracks along a set of zones which are derived using unsupervised learning of the movement patterns of the objects in the scene. We evaluate our method on a large dataset showing normal, suspicious and threat behaviour on a parking lot. Experiments show an improvement of ~ 30% in the recognition of both high-level scenarios and their composing simple actions with respect to a two-stage approach. Experiments with synthetic noise simulating the most common tracking failures show that our method only experiences a limited decrease in performance when moderate amounts of noise are added.

Item Type:	Article
Refereed:	Yes
Divisions:	Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
ID Code:	39812
Uncontrolled Keywords:	Threat recognition; Complex actions; Temporal relations; Multi-threaded parsing; Stochastic parsing
Publisher:	Elsevier

Altmetric

Deposit Details

References

[1] R. Poppe, A survey on vision-based human action recognition, Image Vis. Comput. 28 (6) (2010) 976–990. [2] M.D. Rodriguez, J. Ahmed, M. Shah, Action MACH a spatio-temporal maximum average correlation height filter for action recognition, Computer Vision and Pattern Recognition, 2008, CVPR 2008. IEEE Conference on, 2008, pp. 1–8. [3] I. Laptev, T. Lindeberg, Space–time interest points, Computer Vision, 2003, Proceedings. Ninth IEEE International Conference on, vol. 1, 2003, pp. 432–439. [4] G. Burghouts, K. Schutte, H. Bouma, R. Hollander, Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos, Mach. Vis. Appl. (2013) 1–14. [5] H. Bouma, P. Hanckmann, J.-W. Marck, L. Penning, R. den Hollander, J.-M. ten Hove, S. van den Broek, K. Schutte, G. Burghouts, Automatic human action recognition in a scene from visual inputs, SPIE Defense, Security, and Sensing, 2012. (83880L– 83880L–10). [6] J. Yamato, J. Ohya, K. Ishii, Recognizing human action in time-sequential images using hidden Markov model, Computer Vision and Pattern Recognition, 1992, Proceedings CVPR '92, 1992 IEEE Computer Society Conference on, 1992, pp. 379–385. [7] D. Arsic, B. Schuller, Real time person tracking and behavior interpretation in multi camera scenarios applying homography and coupled hmms, COST 2102 Conference, 2010, pp. 1–18. [8] N.M. Oliver, B. Rosario, A.P. Pentland, A Bayesian computer vision system for modeling human interactions, IEEE Trans. Pattern Anal. Mach. Intell. 22 (8) (2000) 831–843. [9] S. Fine, Y. Singer, N. Tishby, The hierarchical hidden Markov model: analysis and applications, Mach. Learn. 32 (1) (1998) 41–62. [10] S. Lühr, H.H. Bui, S. Venkatesh, G.A.W. West, Recognition of human activity through hierarchical stochastic learning, Proceedings of the First IEEE International Conference on Pervasive Computing and Communications, PERCOM '03, IEEE Computer Society, Washington, DC, USA, 2003, p. 416-. [11] H.H. Bui, D.Q. Phung, S. Venkatesh, Hierarchical hidden markov models with general state hierarchy, Proceedings of the 19th national conference on Artifical intelligence, AAAI'04, AAAI Press, 2004, pp. 324–329. [12] N.T. Nguyen, D.Q. Phung, S. Venkatesh, H. Bui, Learning and detecting activities from movement trajectories using the hierarchical hidden markov models, Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) — Volume 2, CVPR '05, vol. 02, IEEE Computer Society, Washington, DC, USA, 2005, pp. 955–960. [13] Y.A. Ivanov, A.F. Bobick, Recognition of visual activities and interactions by stochastic parsing, IEEE Trans. Pattern Anal. Mach. Intell. 22 (8) (2000) 852–872. [14] D. Minnen, I. Essa, T. Starner, Expectation grammars: leveraging high-level expectations for activity recognition, Computer Vision and Pattern Recognition, 2003, Proceedings. 2003 IEEE Computer Society Conference on, vol. 2, 2003, pp. II-626–II-632, (vol. 2). [15] Y. Ivanov, C. Stauffer, A. Bobick, W.E.L. Grimson, Video surveillance of interactions, Proceedings of the Second IEEE Workshop on Visual Surveillance, VS '99, IEEE Computer Society, Washington, DC, USA, 1999, p. 82-. [16] S.-W. Joo, R. Chellappa, Attribute grammar-based event recognition and anomaly detection, Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop, CVPRW '06, IEEE Computer Society, Washington, DC, USA, 2006, p. 107-. [17] D. Moore, I. Essa, Recognizing multitasked activities from video using stochastic context-free grammar, Eighteenth national conference on Artificial intelligence, American Association for Artificial Intelligence, Menlo Park, CA, USA, 2002, pp. 770–776. [18] G. Sanromà, G. Burghouts, K. Schutte, Recognition of long-term behaviors by parsing sequences of short-term actions with a stochastic regular grammar, Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition, SSPR'12/SPR'12, Springer-Verlag, Berlin, Heidelberg, 2012, pp. 225–233. [19] C.H. Lampert, H. Nickisch, S. Harmeling, Learning to detect unseen object classes by between-class attribute transfer, Computer Vision and Pattern Recognition, 2009, CVPR 2009. IEEE Conference on, IEEE, 2009, pp. 951–958. [20] Y. Fu, T.M. Hospedales, T. Xiang, S. Gong, Attribute learning for understanding unstructured social activity, Computer Vision–ECCV 2012, Springer, 2012, pp. 530–543. [21] M. Rohrbach, M. Regneri, M. Andriluka, S. Amin, M. Pinkal, B. Schiele, Script data for attribute-based recognition of composite activities, Proceedings of the 12th European conference on Computer Vision — Volume Part I, ECCV'12, Springer-Verlag, Berlin, Heidelberg, 2012, pp. 144–157. [22] X. Wang, X. Ma, E. Grimson, Unsupervised activity perception by hierarchical bayesian models, Computer Vision and Pattern Recognition, 2007, CVPR'07. IEEE Conference onIEEE, 2007, pp. 1–8. [23] X. Wang, X. Ma, W.E.L. Grimson, Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models, IEEE Trans. Pattern Anal. Mach. Intell. 31 (3) (2009) 539–555. [24] J.F.P. Kooij, G. Englebienne, D.M. Gavrila, A non-parametric hierarchical model to discover behavior dynamics from tracks, Proceedings of the 12th European conference on Computer Vision — Volume Part VI, ECCV'12, Springer-Verlag, Berlin, Heidelberg, 2012, pp. 270–283. [25] T.M. Hospedales, J. Li, S. Gong, T. Xiang, Identifying rare and subtle behaviors: a weakly supervised joint topic model, IEEE Trans. Pattern Anal. Mach. Intell. 33 (12) (2011) 2451–2464. [26] A. Fernández-Caballero, J.C. Castillo, J.M. Rodríguez Sánchez, Human activity monitoring by local and global finite state machines, Expert Syst. Appl. 39 (2012) 6982–6993. Table 3 Complete rule-set used in the experiments. Non-terminal S denotes the starting symbol of the grammar. Non-terminals in capital letters denote the complex scenarios that we want to recognise. Underlined symbols correspond to starting symbols of simple actions (i.e., S(m) in Section 6.1). We use the shortcuts m, p, o, d, f, s, and e to denote the temporal relations meet, precede, overlap, during, finish, start and equal, respectively. Prior probabilities for the complex activity rules have been omitted since they have been set to equal values. Aggression is detected as a potential thief and the truck driver loitering at the same time nearby truck. The following acronyms have been used: CD (car driver), C (car), TD (truck driver), T (truck), SA (service area), X (undetermined person), TP (truck parking area), CP (car parking area) and SM (smoking area). G. Sanromà et al. / Image and Vision Computing 32 (2014) 363–378 377 [27] M.S. Ryoo, J.K. Aggarwal, Recognition of high-level group activities based on activities of individual members, Proceedings of the 2008 IEEE Workshop on Motion and video Computing, WMVC '08, IEEE Computer Society, Washington, DC, USA, 2008, pp. 1–8. [28] D. Ayers, M. Shah, Monitoring human behavior from video taken in an office environment, Image Vis. Comput. 19 (12) (2001) 833–846. [29] F. Bremond, G. Medioni, Scenario recognition in airborne video imagery, DARPA Image Understanding Workshop 1998, 1998, pp. 211–216. [30] S. Hongeng, R. Nevatia, F. Bremond, Video-based event recognition: activity representation and probabilistic recognition methods, Comput. Vis. Image Underst. 96 (2) (2004) 129–162. [31] Z. Zhang, T. Tan, K. Huang, An extended grammar system for learning and recognizing complex visual events, IEEE Trans. Pattern Anal. Mach. Intell. 33 (2) (2011) 240–255. [32] J. Aggarwal, M. Ryoo, Human activity analysis: a review, ACM Comput. Surv. 43 (3) (2011) 16:1–16:43. [33] R. Hamid, S. Maddi, A. Johnson, A. Bobick, I. Essa, C. Isbell, A novel sequence representation for unsupervised analysis of human activities, Artif. Intell. 173 (14) (2009) 1221–1244. [34] T.V. Duong, H.H. Bui, D.Q. Phung, S. Venkatesh, Activity recognition and abnormality detection with the switching hidden semi-Markov model, Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05) — Volume 1, CVPR '05, vol. 01, IEEE Computer Society, Washington, DC, USA, 2005, pp. 838–845. [35] K. Khoshhal, H. Aliakbarpour, K. Mekhnacha, J. Ros, J. Quintas, J. Dias, Lma-based human behaviour analysis using hmm, DoCEIS, 2011, pp. 189–196. [36] F. Nater, H. Grabner, L. Van Gool, Exploiting simple hierarchies for unsupervised human behavior analysis, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010. [37] H. Dee, D. Hogg, Detecting inexplicable behaviour, British Machine Vision Conference, 2004, pp. 477–486. [38] D. Mahajan, N. Kwatra, S. Jain, P. Kalra, S. Banerjee, A framework for activity recognition and detection of unusual activities, Proc. Indian Conference on Computer Vision, Graphics and Image Processing, 2004. [39] A. Stolcke, An efficient probabilistic context-free parsing algorithm that computes prefix probabilities, Comput. Linguist. 21 (2) (1995) 165–201. [40] J.A. Hartigan, Clustering algorithms, John Wiley & Sons, Inc., New York, 1975. [41] J.L. Patino Vilchis, F. Bremond, M. Evans, A. Shahrokni, J. Ferryman, Video activity extraction and reporting with incremental unsupervised learning, 7th IEEE International Conference on Advanced Video and Signal-Based Surveillance, Boston, USA, 2010. [42] L.R. Rabiner, Readings in speech recognition, Ch. A tutorial on hidden Markov models and selected applications in speech recognition, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1990. 267–296. [43] K. Tu, V. Honavar, Unsupervised learning of probabilistic context-free grammar using iterative biclustering, Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications, ICGI '08, Springer-Verlag, Berlin, Heidelberg, 2008, pp. 224–237. [44] Z. Si, M. Pei, B. Yao, S.-C. Zhu, Unsupervised learning of event and-or grammar and semantics from video, Proceedings of the 2011 International Conference on Computer Vision, ICCV '11, IEEE Computer Society, Washington, DC, USA, 2011, pp. 41–48. [45] Z. Zhang, K. Huang, T. Tan, L. Wang, Trajectory series analysis based event rule induction for visual surveillance, CVPR, IEEE Computer Society, 2007. [46] K.S.R. Dubba, A.G. Cohn, D.C. Hogg, Event model learning from complex videos using ilp, Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence, IOS Press, Amsterdam, The Netherlands, The Netherlands, 2010, pp. 93–98. [47] C.G. Snoek, M. Worring, Multimedia event-based video indexing using time intervals, Trans. Multimedia 7 (4) (2005) 638–647.

University Staff: Request a correction | Centaur Editors: Update this record

University of Reading

CentAUR: Central Archive at the University of Reading

Accessibility navigation

A unified approach to the recognition of complex actions from sequences of zone-crossings

Abstract/Summary

Page navigation

See also

Footer navigation