Aggarwal, J., Ryoo, M., 2011. Human activity analysis: A review. ACM Comput. Surv.
43, 16:1–16:43.
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R., 2005. Actions as space-time
shapes, in: Computer Vision, 2005. ICCV 2005. Tenth IEEE International
Conference on, Vol. 2, pp. 1395–1402.
Bobick, A., Davis, J., 2001. The recognition of human movement using temporal
templates. IEEE Transactions on Pattern Analysis and Machine Inelligence 23,
257–267.
Ba˘za˘van, E.G., Li, F., Sminchisescu, C., 2012. Learning random kernel approximations
for object recognition 1203.1483.
Campbell, L., Bobick, A., 1995. Recognition of human body motion using phase space
constraints, in: Computer Vision, 1995. Proceedings, Fifth International
Conference on, pp. 624–630.
Camplani, M., Salgado, L., de Imágenes, G., 2012. Efficient spatio-temporal hole
filling strategy for kinect depth maps, in: In Proceedings of SPIE.
Chang, C.C., Lin, C.J., 2011. LIBSVM: A library for support vector machines. ACM
Transactions on Intelligent Systems and Technology 2, 27:1–27:27. Software
available at http://www.csie.ntu.edu.tw/ cjlin/libsvm
Chaovalitwongse, W.A., Pardalos, P.M., 2008. On the time series support vector
machine using dynamic time warping kernel for brain activity classification.
Cybernetics and Sys. Anal. 44, 125–138.
Charles, J., Everingham, M., 2011. Learning shape models for monocular human pose
estimation from the Microsoft Xbox Kinect, in: 2011 IEEE International
Conference on Computer Vision Workshops (ICCVW), pp. 1202–1208.
Chen, L., Wei, H., Ferryman, J., 2013. A survey of human motion analysis using depth
imagery. Pattern Recognition Letters 34, 1995–2006.
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S., 2005. Behavior recognition via sparse
spatio-temporal features. Visual Surveillance and Performance Evaluation of
Tracking and Surveillance (VS-PETS) 0, 65–72.
Gilbert, A., Illingworth, J., Bowden, R., 2008. Scale invariant action recognition using
compound features mined from dense spatio-temporal corners, in: Proceedings
of the 10th European Conference on Computer Vision: Part I. Springer-Verlag,
Berlin, Heidelberg, pp. 222–233.
Gudmundsson, S., Runarsson, T., Sigurdsson, S., 2008. Support vector machines and
dynamic time warping for time series, in: Neural Networks, 2008. IJCNN 2008.
(IEEE World Congress on Computational Intelligence). IEEE International Joint
Conference on, pp. 2772–2776.
Holt, B., Ong, E.J., Cooper, H., Bowden, R., 2011. Putting the pieces together:
Connected poselets for human pose estimation, in: 2011 IEEE International
Conference on Computer Vision Workshops (ICCV Workshops), pp. 1196–1201.
Junejo, I., Dexter, E., Laptev, I., Pérez, P., 2011. View-independent action recognition
from temporal self-similarities. IEEE Trans. Pattern Anal. Mach. Int. 33, 172–
185.
Ke, Y., Sukthankar, R., Hebert, M., 2005. Efficient visual event detection using
volumetric features, in: Proceedings of the Tenth IEEE International Conference
on Computer Vision (ICCV’05) Volume 1 - Volume 01, IEEE Computer Society,
Washington, DC, USA. pp. 166–173.
Kläser, A., Marszalek, M., Schmid, C., 2008. A spatio-temporal descriptor based on
3d-gradients., in: Everingham, M., Needham, C.J., Fraile, R. (Eds.), BMVC, British
Machine Vision Association.
Lai, K., Bo, L., Ren, X., Fox, D., 2011. A large-scale hierarchical multi-view rgb-d
object dataset, in: Robotics and Automation (ICRA), 2011 IEEE International
Conference on, pp. 1817–1824.
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B., 2008. Learning realistic human
actions from movies. IEEE Conference on Computer Vision and Pattern
Recognition, CVPR 2008, 1–8.
Li, W., Zhang, Z., Liu, Z., 2008. Expandable data-driven graphical modeling of human
actions based on salient postures. IEEE Trans. Cir. and Sys. for Video Technol. 18,
1499–1510.
Li, W., Zhang, Z., Liu, Z., 2010. Action recognition based on a bag of 3D points, in:
2010 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW), pp. 9–14.
Li, B., Camps, O., Sznaier, M., 2012. Cross-View Activity Recognition Using
Hankelets. In: 2012 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 1362–1369.
Liu, J., Shah, M., 2008. Learning human actions via information maximization, in:
Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference
on, pp. 1–8.
Lowe, D.G., 2004. Distinctive image features from scale-invariant keypoints. Int. J.
Comput. Vision 60, 91–110.
Maimone, A., Fuchs, H., 2012. Reducing interference between multiple structured
light depth sensors using motion. In: Proceedings of the 2012 IEEE Virtual
Reality, IEEE Computer Society, Washington, DC, USA, pp. 51–54
Malgireddy, M., Inwogu, I., Govindaraju, V., 2012. A temporal bayesian model for
classifying, detecting and localizing activities in video sequences. In: 2012 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition
Workshops (CVPRW), pp. 43–48.
Matyunin, S., Vatolin, D., Berdnikov, Y., Smirnov, M., 2011. Temporal filtering for
depth maps generated by kinect depth camera. In: 3DTV Conference: The True
Vision – Capture, Transmission and Display of 3D Video (3DTV-CON), 2011, pp.
1–4.
Niebles, J., Fei-Fei, L., 2007. A hierarchical model of shape and appearance for
human action classification. In: Computer Vision and Pattern Recognition, 2007.
CVPR ’07. IEEE Conference on, pp. 1–8.
Niebles, J., Wang, H., Fei-Fei, L., 2006. Unsupervised learning of human action
categories using spatial-temporal words. British Machine Vision Conference.
Ni, B., Wang, G., Moulin, P., 2011. RGBD-HuDaAct: A color-depth video database for
human daily activity recognition. In: 2011 IEEE International Conference on
Computer Vision Workshops (ICCV Workshops), pp. 1147–1153.
Nister, D., Stewenius, H., 2006. Scalable recognition with a vocabulary tree. In:
Computer Vision and Pattern Recognition, 2006 IEEE Computer Society
Conference on, pp. 2161–2168.
Poppe, R., 2010. A survey on vision-based human action recognition. Image Vision
Comput. 28, 976–990.
Reyes, M., Dominguez, G., Escalera, S., 2011. Featureweighting in dynamic
timewarping for gesture recognition in depth data. In: 2011 IEEE
International Conference on Computer Vision Workshops (ICCVW), pp. 1182–
1188.
Ryoo, M.S., Aggarwal, J.K., 2011. Stochastic representation and recognition of highlevel
group activities. Int. J. Comput. Vision 93, 183–200.
Sakoe, H., Chiba, S., 1978. Dynamic programming algorithm optimization for spoken
word recognition. Acoustics, Speech and Signal Processing, IEEE Transactions on
26, 43–49.
Savarese, S., DelPozo, A., Niebles, J., Fei-Fei, L., 2008. Spatial-temporal correlatons for
unsupervised action classification. In: Motion and video Computing, 2008.
WMVC 2008. IEEE Workshop on, pp. 1–8.
Schuldt, C., Laptev, I., Caputo, B., 2004. Recognizing human actions: a local svm
approach. In: Proceedings of the 17th International Conference on Pattern
Recognition, ICPR 2004, pp. 32–36 Vol. 3.
Scovanner, P., Ali, S., Shah, M., 2007. A 3-dimensional sift descriptor and its
application to action recognition. In: Proceedings of the 15th international
conference on Multimedia, p. 357.
Sempena, S., Maulidevi, N., Aryan, P., 2011. Human action recognition using
Dynamic Time Warping. In: 2011 International Conference on Electrical
Engineering and Informatics (ICEEI), pp. 1–5.
Sheikh, Y., Sheikh, M., Shah, M., 2005. Exploring the space of a human action. In:
Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, pp.
144–149 Vol. 1.
Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., Moore, R.,
Kohli, P., Criminisi, A., Kipman, A., Blake, A., 2012. Efficient human pose
estimation from single depth images. Pattern Analysis and Machine
Intelligence, IEEE Transactions on, p. 1.
Siddiqui, M., Medioni, G., 2010. Human pose estimation from a single view point,
real-time range sensor. In: 2010 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–8.
Skiena, S.S., 2008. The Algorithm Design Manual, second ed. Springer Publishing
Company, Incorporated.
Sung, J., Ponce, C., Selman, B., Saxena, A., 2012. Unstructured Human Activity
Detection from RGBD Images. 2012 IEEE International Conference on Robotics
and Automation.
Vedaldi, A., Zisserman, A., 2012. Efficient additive kernels via explicit feature maps.
Pattern Analysis and Machine Intelligence, IEEE Transactions on 34, 480–492.
Wang, H., Ullah, M.M., Kläser, A., Laptev, I., Schmid, C., 2009. Evaluation of local
spatio-temporal features for action recognition. In: British Machine Vision
Conference (BMVC), p. 127.
Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y., 2012a. Robust 3D action recognition
with random occupancy patterns. In: Computer Vision – ECCV 2012, pp. 872–
885.
Wang, J., Liu, Z., Wu, Y., Yuan, J., 2012b. Mining actionlet ensemble for action
recognition with depth cameras. In: 2012 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pp. 1290–1297.
Weinland, D., Ronfard, R., Boyer, E., 2006. Free viewpoint action recognition using
motion history volumes. Comput. Vis. Image Underst 104, 249–257.
Weinland, D., Ronfard, R., Boyer, E., 2010. A survey of vision-based methods for
action representation, segmentation and recognition. Compouter Vision and
Image Understanding 115, 224–241.
Willems, G., Tuytelaars, T., Gool, L., 2008. An efficient dense and scale-invariant
spatio-temporal interest point detector. In: Proceedings of the 10th European
Conference on Computer Vision: Part II. Springer-Verlag, Berlin, Heidelberg, pp.
650–663.
Wolf, C., Mille, J., Lombardi, L., Celiktutan, O., Jiu, M., Baccouche, M., Dellandrea, E.,
Bichot, C.E., Garcia, C., Sankur, B., 2012. The LIRIS Human activities dataset and
the ICPR 2012 human activities recognition and localization competition.
Technical Report. RR-LIRIS-2012-004, LIRIS Laboratory.
Wong, S.F., Kim, T.K., Cipolla, R., 2007. Learning motion categories using both
semantic and structural information. In: Computer Vision and Pattern
Recognition, 2007. CVPR ’07. IEEE Conference on, pp. 1–6.
Wu, D., Zhu, F., Shao, L., 2012. One shot learning gesture recognition from RGBD
images. In: 2012 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Workshops (CVPRW), pp. 7–12.
Xia, L., Chen, C.C., Aggarwal, J., 2012. View invariant human action recognition using
histograms of 3d joints. In: Computer Vision and Pattern Recognition
Workshops (CVPRW), 2012 IEEE Computer Society Conference on, pp. 20–27.
Yilma, A., Shah, M., 2005. Recognizing human actions in videos acquired by
uncalibrated moving cameras. In: Computer Vision, 2005. ICCV 2005. Tenth
IEEE International Conference on, pp. 150–157 Vol. 1.
Yu, T.H., Kim, T.K., Cipolla, R., 2010. Real-time action recognition by spatiotemporal
semantic and structural forest. In: Proceedings of the British Machine Vision
Conference. BMVA Press. pp. 52 (1–52), 12. http://dx.doi.org/10.5244/C.24.52.
Zhang, H., Parker, L.E., 2011. 4-dimensional local spatio-temporal features for
human activity recognition. In: 2011 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), pp. 2044–2049.
Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C., 2007. Local features and kernels for
classification of texture and object categories: A comprehensive study. Int. J.
Comput. Vision 73, 213–238.
Zhao, Y., Liu, Z., Yang, L., Cheng, H., 2012. Combing rgb and depth map features for
human activity recognition. In: Signal Information Processing Association
Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific, pp. 1–4.