[1] R. Poppe, A survey on vision-based human action recognition, Image Vis. Comput.
28 (6) (2010) 976–990.
[2] M.D. Rodriguez, J. Ahmed, M. Shah, Action MACH a spatio-temporal maximum average
correlation height filter for action recognition, Computer Vision and Pattern
Recognition, 2008, CVPR 2008. IEEE Conference on, 2008, pp. 1–8.
[3] I. Laptev, T. Lindeberg, Space–time interest points, Computer Vision, 2003, Proceedings.
Ninth IEEE International Conference on, vol. 1, 2003, pp. 432–439.
[4] G. Burghouts, K. Schutte, H. Bouma, R. Hollander, Selection of negative samples and
two-stage combination of multiple features for action detection in thousands of
videos, Mach. Vis. Appl. (2013) 1–14.
[5] H. Bouma, P. Hanckmann, J.-W. Marck, L. Penning, R. den Hollander, J.-M. ten Hove,
S. van den Broek, K. Schutte, G. Burghouts, Automatic human action recognition in a
scene from visual inputs, SPIE Defense, Security, and Sensing, 2012. (83880L–
83880L–10).
[6] J. Yamato, J. Ohya, K. Ishii, Recognizing human action in time-sequential images
using hidden Markov model, Computer Vision and Pattern Recognition, 1992, Proceedings
CVPR '92, 1992 IEEE Computer Society Conference on, 1992, pp. 379–385.
[7] D. Arsic, B. Schuller, Real time person tracking and behavior interpretation in multi
camera scenarios applying homography and coupled hmms, COST 2102 Conference,
2010, pp. 1–18.
[8] N.M. Oliver, B. Rosario, A.P. Pentland, A Bayesian computer vision system for modeling
human interactions, IEEE Trans. Pattern Anal. Mach. Intell. 22 (8) (2000) 831–843.
[9] S. Fine, Y. Singer, N. Tishby, The hierarchical hidden Markov model: analysis and
applications, Mach. Learn. 32 (1) (1998) 41–62.
[10] S. Lühr, H.H. Bui, S. Venkatesh, G.A.W. West, Recognition of human activity through
hierarchical stochastic learning, Proceedings of the First IEEE International Conference
on Pervasive Computing and Communications, PERCOM '03, IEEE Computer
Society, Washington, DC, USA, 2003, p. 416-.
[11] H.H. Bui, D.Q. Phung, S. Venkatesh, Hierarchical hidden markov models with general
state hierarchy, Proceedings of the 19th national conference on Artifical intelligence,
AAAI'04, AAAI Press, 2004, pp. 324–329.
[12] N.T. Nguyen, D.Q. Phung, S. Venkatesh, H. Bui, Learning and detecting activities from
movement trajectories using the hierarchical hidden markov models, Proceedings
of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR'05) — Volume 2, CVPR '05, vol. 02, IEEE Computer Society,
Washington, DC, USA, 2005, pp. 955–960.
[13] Y.A. Ivanov, A.F. Bobick, Recognition of visual activities and interactions by stochastic
parsing, IEEE Trans. Pattern Anal. Mach. Intell. 22 (8) (2000) 852–872.
[14] D. Minnen, I. Essa, T. Starner, Expectation grammars: leveraging high-level expectations
for activity recognition, Computer Vision and Pattern Recognition, 2003, Proceedings.
2003 IEEE Computer Society Conference on, vol. 2, 2003, pp. II-626–II-632,
(vol. 2).
[15] Y. Ivanov, C. Stauffer, A. Bobick, W.E.L. Grimson, Video surveillance of interactions,
Proceedings of the Second IEEE Workshop on Visual Surveillance, VS '99, IEEE Computer
Society, Washington, DC, USA, 1999, p. 82-.
[16] S.-W. Joo, R. Chellappa, Attribute grammar-based event recognition and anomaly
detection, Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition
Workshop, CVPRW '06, IEEE Computer Society, Washington, DC, USA,
2006, p. 107-.
[17] D. Moore, I. Essa, Recognizing multitasked activities from video using stochastic
context-free grammar, Eighteenth national conference on Artificial intelligence,
American Association for Artificial Intelligence, Menlo Park, CA, USA, 2002,
pp. 770–776.
[18] G. Sanromà, G. Burghouts, K. Schutte, Recognition of long-term behaviors by parsing
sequences of short-term actions with a stochastic regular grammar, Proceedings of
the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical
Pattern Recognition, SSPR'12/SPR'12, Springer-Verlag, Berlin, Heidelberg, 2012,
pp. 225–233.
[19] C.H. Lampert, H. Nickisch, S. Harmeling, Learning to detect unseen object classes by
between-class attribute transfer, Computer Vision and Pattern Recognition, 2009,
CVPR 2009. IEEE Conference on, IEEE, 2009, pp. 951–958.
[20] Y. Fu, T.M. Hospedales, T. Xiang, S. Gong, Attribute learning for understanding
unstructured social activity, Computer Vision–ECCV 2012, Springer, 2012,
pp. 530–543.
[21] M. Rohrbach, M. Regneri, M. Andriluka, S. Amin, M. Pinkal, B. Schiele, Script data for
attribute-based recognition of composite activities, Proceedings of the 12th
European conference on Computer Vision — Volume Part I, ECCV'12, Springer-Verlag,
Berlin, Heidelberg, 2012, pp. 144–157.
[22] X. Wang, X. Ma, E. Grimson, Unsupervised activity perception by hierarchical
bayesian models, Computer Vision and Pattern Recognition, 2007, CVPR'07. IEEE
Conference onIEEE, 2007, pp. 1–8.
[23] X. Wang, X. Ma, W.E.L. Grimson, Unsupervised activity perception in crowded and
complicated scenes using hierarchical bayesian models, IEEE Trans. Pattern Anal.
Mach. Intell. 31 (3) (2009) 539–555.
[24] J.F.P. Kooij, G. Englebienne, D.M. Gavrila, A non-parametric hierarchical model to
discover behavior dynamics from tracks, Proceedings of the 12th European conference
on Computer Vision — Volume Part VI, ECCV'12, Springer-Verlag, Berlin,
Heidelberg, 2012, pp. 270–283.
[25] T.M. Hospedales, J. Li, S. Gong, T. Xiang, Identifying rare and subtle behaviors: a
weakly supervised joint topic model, IEEE Trans. Pattern Anal. Mach. Intell. 33
(12) (2011) 2451–2464.
[26] A. Fernández-Caballero, J.C. Castillo, J.M. Rodríguez Sánchez, Human activity monitoring
by local and global finite state machines, Expert Syst. Appl. 39 (2012)
6982–6993.
Table 3
Complete rule-set used in the experiments. Non-terminal S denotes the starting symbol of
the grammar. Non-terminals in capital letters denote the complex scenarios that we want
to recognise. Underlined symbols correspond to starting symbols of simple actions
(i.e., S(m) in Section 6.1). We use the shortcuts m, p, o, d, f, s, and e to denote the temporal
relations meet, precede, overlap, during, finish, start and equal, respectively. Prior probabilities
for the complex activity rules have been omitted since they have been set to equal
values. Aggression is detected as a potential thief and the truck driver loitering at the
same time nearby truck. The following acronyms have been used: CD (car driver),
C (car), TD (truck driver), T (truck), SA (service area), X (undetermined person), TP
(truck parking area), CP (car parking area) and SM (smoking area).
G. Sanromà et al. / Image and Vision Computing 32 (2014) 363–378 377
[27] M.S. Ryoo, J.K. Aggarwal, Recognition of high-level group activities based on activities
of individual members, Proceedings of the 2008 IEEE Workshop on Motion and video
Computing, WMVC '08, IEEE Computer Society, Washington, DC, USA, 2008, pp. 1–8.
[28] D. Ayers, M. Shah, Monitoring human behavior from video taken in an office environment,
Image Vis. Comput. 19 (12) (2001) 833–846.
[29] F. Bremond, G. Medioni, Scenario recognition in airborne video imagery, DARPA
Image Understanding Workshop 1998, 1998, pp. 211–216.
[30] S. Hongeng, R. Nevatia, F. Bremond, Video-based event recognition: activity representation
and probabilistic recognition methods, Comput. Vis. Image Underst. 96
(2) (2004) 129–162.
[31] Z. Zhang, T. Tan, K. Huang, An extended grammar system for learning and recognizing
complex visual events, IEEE Trans. Pattern Anal. Mach. Intell. 33 (2) (2011) 240–255.
[32] J. Aggarwal, M. Ryoo, Human activity analysis: a review, ACM Comput. Surv. 43 (3)
(2011) 16:1–16:43.
[33] R. Hamid, S. Maddi, A. Johnson, A. Bobick, I. Essa, C. Isbell, A novel sequence representation
for unsupervised analysis of human activities, Artif. Intell. 173 (14)
(2009) 1221–1244.
[34] T.V. Duong, H.H. Bui, D.Q. Phung, S. Venkatesh, Activity recognition and abnormality
detection with the switching hidden semi-Markov model, Proceedings of the 2005
IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(CVPR '05) — Volume 1, CVPR '05, vol. 01, IEEE Computer Society, Washington,
DC, USA, 2005, pp. 838–845.
[35] K. Khoshhal, H. Aliakbarpour, K. Mekhnacha, J. Ros, J. Quintas, J. Dias, Lma-based
human behaviour analysis using hmm, DoCEIS, 2011, pp. 189–196.
[36] F. Nater, H. Grabner, L. Van Gool, Exploiting simple hierarchies for unsupervised
human behavior analysis, IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, 2010.
[37] H. Dee, D. Hogg, Detecting inexplicable behaviour, British Machine Vision Conference,
2004, pp. 477–486.
[38] D. Mahajan, N. Kwatra, S. Jain, P. Kalra, S. Banerjee, A framework for activity recognition
and detection of unusual activities, Proc. Indian Conference on Computer
Vision, Graphics and Image Processing, 2004.
[39] A. Stolcke, An efficient probabilistic context-free parsing algorithm that computes
prefix probabilities, Comput. Linguist. 21 (2) (1995) 165–201.
[40] J.A. Hartigan, Clustering algorithms, John Wiley & Sons, Inc., New York, 1975.
[41] J.L. Patino Vilchis, F. Bremond, M. Evans, A. Shahrokni, J. Ferryman, Video activity extraction
and reporting with incremental unsupervised learning, 7th IEEE International
Conference on Advanced Video and Signal-Based Surveillance, Boston, USA,
2010.
[42] L.R. Rabiner, Readings in speech recognition, Ch. A tutorial on hidden Markov
models and selected applications in speech recognition, Morgan Kaufmann Publishers
Inc., San Francisco, CA, USA, 1990. 267–296.
[43] K. Tu, V. Honavar, Unsupervised learning of probabilistic context-free grammar
using iterative biclustering, Proceedings of the 9th international colloquium on
Grammatical Inference: Algorithms and Applications, ICGI '08, Springer-Verlag,
Berlin, Heidelberg, 2008, pp. 224–237.
[44] Z. Si, M. Pei, B. Yao, S.-C. Zhu, Unsupervised learning of event and-or grammar and
semantics from video, Proceedings of the 2011 International Conference on
Computer Vision, ICCV '11, IEEE Computer Society, Washington, DC, USA, 2011,
pp. 41–48.
[45] Z. Zhang, K. Huang, T. Tan, L. Wang, Trajectory series analysis based event rule induction
for visual surveillance, CVPR, IEEE Computer Society, 2007.
[46] K.S.R. Dubba, A.G. Cohn, D.C. Hogg, Event model learning from complex videos using
ilp, Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on
Artificial Intelligence, IOS Press, Amsterdam, The Netherlands, The Netherlands,
2010, pp. 93–98.
[47] C.G. Snoek, M. Worring, Multimedia event-based video indexing using time intervals,
Trans. Multimedia 7 (4) (2005) 638–647.