A survey of human motion analysis using depth imageryChen, L., Wei, H. ORCID: https://orcid.org/0000-0002-9664-5748 and Ferryman, J. (2013) A survey of human motion analysis using depth imagery. Pattern Recognition Letters, 34 (15). pp. 1995-2006. ISSN 0167-8655 Full text not archived in this repository. It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. To link to this item DOI: 10.1016/j.patrec.2013.02.006 Abstract/SummaryAnalysis of human behaviour through visual information has been a highly active research topic in the computer vision community. This was previously achieved via images from a conventional camera, but recently depth sensors have made a new type of data available. This survey starts by explaining the advantages of depth imagery, then describes the new sensors that are available to obtain it. In particular, the Microsoft Kinect has made high-resolution real-time depth cheaply available. The main published research on the use of depth imagery for analysing human activity is reviewed. Much of the existing work focuses on body part detection and pose estimation. A growing research area addresses the recognition of human actions. The publicly available datasets that include depth imagery are listed, as are the software libraries that can acquire it from a sensor. This survey concludes by summarising the current state of work on this topic, and pointing out promising future research directions.
Aggarwal, J., Ryoo, M., 2011. Human activity analysis: A review. ACM
690 Comput. Surv. 43, 16:1{16:43.
691 Ahad, M.A.R., Tan, J.K., Kim, H., Ishikawa, S., 2012. Motion history image:
692 its variants and applications. Mach. Vision Appl. 23, 255{281.
693 Allen, J., 1983. Maintaining knowledge about temporal intervals. Commun.
694 ACM 26, 832{843.
695 Anguelov, D., Taskarf, B., Chatalbashev, V., Koller, D., Gupta, D., Heitz,
696 G., Ng, A., 2005. Discriminative learning of Markov random �elds for
697 segmentation of 3D scan data, in: IEEE Computer Society Conference on
698 Computer Vision and Pattern Recognition, CVPR 2005, pp. 169{176.
699 Batlle, J., Mouaddib, E., Salvi, J., 1998. Recent progress in coded structured
700 light as a technique to solve the correspondence problem: a survey. Pattern
701 Recognition 31, 963{ 982.
702 Benko, H., Wilson, A.D., 2009. Depthtouch: Using depth-sensing camera to
703 enable freehand interactions on and above the interactive surface. IEEE
704 Workshop on Tabletops and Interactive Surfaces .
705 Van den Bergh, M., Van Gool, L., 2011. Combining RGB and ToF cameras
706 for real-time 3D hand gesture interaction, in: 2011 IEEE Workshop on
707 Applications of Computer Vision (WACV), pp. 66 {72.
708 Besl, P., McKay, H., 1992. A method for registration of 3-D shapes. IEEE
709 Transactions on Pattern Analysis and Machine Intelligence 14, 239{256.
Bobick, A., Davis, J., 2001. The recognition of human movement using
711 temporal templates. IEEE Transactions on Pattern Analysis and Machine
712 Inelligence 23, 257{ 267.
713 Breuer, P., Eckes, C., Mu�ller, S., 2007. Hand gesture recognition with a
714 novel ir time-of-
ight range camera: a pilot study, in: Proceedings of
715 the 3rd international conference on Computer vision/computer graphics
716 collaboration techniques, Springer-Verlag, Berlin, Heidelberg. pp. 247{260.
717 Charles, J., Everingham, M., 2011. Learning shape models for monocular
718 human pose estimation from the Microsoft Xbox Kinect, in: 2011 IEEE
719 International Conference on Computer Vision Workshops (ICCVW), pp.
720 1202{1208.
721 Chen, C.S., Hung, Y.P., Chiang, C.C., Wu, J.L., 1997. Range data acqui722
sition using color structured lighting and stereo vision. Image and Vision
723 Computing 15, 445{ 456.
724 Chen, L., Wei, H., Ferryman, J., 2011. Recognition of everyday domestic
725 activities using a depth sensor, in: BMVC 2011 student workshop, pp.
726 27{37.
727 Demirdjian, D., Ko, T., Darrell, T., 2003. Constraining human body track728
ing, in: 2003 Proceedings. Ninth IEEE International Conference on Com729
puter Vision, pp. 1071{1078 vol.2.
730 Dolla�r, P., Rabaud, V., Cottrell, G., Belongie, S., 2005. Behavior recognition
731 via sparse spatio-temporal features. Visual Surveillance and Performance
732 Evaluation of Tracking and Surveillance (VS-PETS) 0, 65{72.
Ganapathi, V., Plagemann, C., Koller, D., Thrun, S., 2010. Real time motion
734 capture using a single time-of-
ight camera, in: 2010 IEEE Conference on
735 Computer Vision and Pattern Recognition (CVPR), pp. 755{762.
736 Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A., 2011.
737 E�cient regression of general-activity human poses from depth images, in:
738 Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 415
739 {422.
740 Grammalidis, N., Goussis, G., Troufakos, G., Strintzis, M., 2001. 3-D hu741
man body tracking from depth images using analysis by synthesis, in: Pro742
ceedings. 2001 International Conference on Image Processing, pp. 185{188
743 vol.2.
744 Grest, D., Woetzel, J., Koch, R., 2005. Nonlinear body pose estimation from
745 depth images, in: Proceedings of the 27th DAGM conference on Pattern
746 Recognition, Springer-Verlag, Berlin, Heidelberg. pp. 285{292.
747 Guomundsson, S., Larsen, R., Aanaes, H., Pardas, M., Casas, J., 2008. TOF
748 imaging in smart room environments towards improved people tracking,
749 in: 2008. IEEE Computer Society Conference on Computer Vision and
750 Pattern Recognition Workshops (CVPRW), pp. 1{6.
751 Hartley, R.I., Zisserman, A., 2004. Multiple View Geometry in Computer
752 Vision. Cambridge University Press, ISBN: 0521540518. second edition.
753 Holt, B., Ong, E.J., Cooper, H., Bowden, R., 2011. Putting the pieces to754
gether: Connected poselets for human pose estimation, in: 2011 IEEE
International Conference on Computer Vision Workshops (ICCV Work756
shops), pp. 1196{1201.
757 Holte, M.B., Moeslund, T.B., 2007. Gesture recognition using a range cam758
era. Technical Report , 1{5.
759 Hu, G., Stockman, G., 1989. 3-D surface solution using structured light
760 and constraint propagation. IEEE Transactions on Pattern Analysis and
761 Machine Intelligence 11, 390{402.
762 Iddan, G.J., Yahav, G., 2001. 3D imaging in the studio. IN: SPIE 4298,
763 48{55.
764 Jansen, B., Temmermans, F., Deklerck, R., 2007. 3D human pose recognition
765 for home monitoring of elderly, in: Engineering in Medicine and Biology
766 Society, EMBS 2007. 29th Annual International Conference of the IEEE,
767 pp. 4049{4051.
768 Ji, X., Liu, H., 2010. Advances in view-invariant human motion analysis: A
769 review. Systems, Man, and Cybernetics, Part C: Applications and Reviews,
770 IEEE Transactions on 40, 13 {24.
771 Johansson, G., 1973. Visual perception of biological motion and a model for
772 its analysis. Attention, Perception, and Psychophysics 14, 201{211.
773 Kalogerakis, E., Hertzmann, A., Singh, K., 2010. Learning 3D mesh segmen774
tation and labeling, in: ACM SIGGRAPH 2010, ACM, New York, NY,
775 USA. pp. 102:1{102:12.
International Conference on Computer Vision Workshops (ICCV Work756
shops), pp. 1196{1201.
757 Holte, M.B., Moeslund, T.B., 2007. Gesture recognition using a range cam758
era. Technical Report , 1{5.
759 Hu, G., Stockman, G., 1989. 3-D surface solution using structured light
760 and constraint propagation. IEEE Transactions on Pattern Analysis and
761 Machine Intelligence 11, 390{402.
762 Iddan, G.J., Yahav, G., 2001. 3D imaging in the studio. IN: SPIE 4298,
763 48{55.
764 Jansen, B., Temmermans, F., Deklerck, R., 2007. 3D human pose recognition
765 for home monitoring of elderly, in: Engineering in Medicine and Biology
766 Society, EMBS 2007. 29th Annual International Conference of the IEEE,
767 pp. 4049{4051.
768 Ji, X., Liu, H., 2010. Advances in view-invariant human motion analysis: A
769 review. Systems, Man, and Cybernetics, Part C: Applications and Reviews,
770 IEEE Transactions on 40, 13 {24.
771 Johansson, G., 1973. Visual perception of biological motion and a model for
772 its analysis. Attention, Perception, and Psychophysics 14, 201{211.
773 Kalogerakis, E., Hertzmann, A., Singh, K., 2010. Learning 3D mesh segmen774
tation and labeling, in: ACM SIGGRAPH 2010, ACM, New York, NY,
775 USA. pp. 102:1{102:12.
Knoop, S., Vacek, S., Dillmann, R., 2009. Fusion of 2D and 3D sensor data
777 for articulated body tracking. Robotics Autonomous Systems 57, 321{329.
778 Kolb, A., Barth, E., Koch, R., 2008. Tof-sensors: New dimensions for realism
779 and interactivity. IEEE Computer Society Conference On Computer Vision
780 and Pattern Recognition Workshops (CVPRW) , 1518{1523.
781 Kollorz, E., Penne, J., Hornegger, J., Barke, A., 2008. Gesture recognition
782 with a time-of-
ight camera. International Journal of Intelligent Systems
783 Technologies and Applications 5, 334.
784 Kurakin, A., Zhang, Z., Liu, Z., 2012. A real time system for dynamic hand
785 gesture recognition with a depth sensor, in: Signal Processing Conference
786 (EUSIPCO), 2012 Proceedings of the 20th European, pp. 1975 {1979.
787 Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B., 2008. Learning realistic
788 human actions from movies, in: IEEE Conference on Computer Vision and
789 Pattern Recognition, CVPR 2008., pp. 1{8.
790 Li, W., Zhang, Z., Liu, Z., 2010. Action recognition based on a bag of 3D
791 points, in: 2010 IEEE Computer Society Conference on Computer Vision
792 and Pattern Recognition Workshops (CVPRW), pp. 9{14.
793 Lui, Y.M., 2012. A least squares regression framework on manifolds and
794 its application to gesture recognition, in: 2012 IEEE Computer Soci795
ety Conference on Computer Vision and Pattern Recognition Workshops
796 (CVPRW), pp. 13 {18.
Malgireddy, M., Inwogu, I., Govindaraju, V., 2012. A temporal bayesian
798 model for classifying, detecting and localizing activities in video sequences,
in: 2012 IEEE Computer Society Conference on Computer Vision and
800 Pattern Recognition Workshops (CVPRW), pp. 43 {48.
801 Marszalek, M., Laptev, I., Schmid, C., 2009. Actions in context, in: IEEE
802 Conference on Computer Vision and Pattern Recognition, CVPR 2009.,
803 pp. 2929{2936.
804 Moeslund, T., Hilton, A., Kru�ger, V., 2006. A survey of advances in vision805
based human motion capture and analysis. Computer Vision and Image
806 Understanding 104, 90{126.
807 Ni, B., Wang, G., Moulin, P., 2011. RGBD-HuDaAct: A color-depth video
808 database for human daily activity recognition, in: 2011 IEEE International
809 Conference on Computer Vision Workshops (ICCV Workshops), pp. 1147{
810 1153.
811 Oggier, T., Bu�ttgen, B., Lustenberger, F., Becker, G., Ru�egg, B., Hodac,
812 A., 2005. Swissranger SR3000 and �rst experiences based on mniaturized
813 3D-ToF cameras., in: In Proc. of the First Range Imaging Research Day
814 at ETH Zurich.
815 Pellegrini, S., Iocchi, L., 2008. Human posture tracking and classi�cation
816 through stereo vision and 3D model matching. J. Image Video Process.
817 2008, 7:1{7:12.
Phillips, P., Flynn, P., Scruggs, T., Bowyer, K., Chang, J., Ho�man, K.,
819 Marques, J., Min, J., Worek, W., 2005. Overview of the face recogni820
tion grand challenge, in: Computer Vision and Pattern Recognition, 2005.
821 CVPR 2005. IEEE Computer Society Conference on, pp. 947 { 954 vol. 1.
Plagemann, C., Ganapathi, V., Koller, D., Thrun, S., 2010. Real-time iden823
ti�cation and localization of body parts from depth images, in: IEEE
824 International Conference on Robotics and Automation (ICRA).
825 Poppe, R., 2010. A survey on vision-based human action recognition. Image
826 Vision Comput. 28, 976{990.
827 Reyes, M., Dominguez, G., Escalera, S., 2011. Featureweighting in dynamic
828 timewarping for gesture recognition in depth data, in: 2011 IEEE Inter829
national Conference on Computer Vision Workshops (ICCVW), pp. 1182{
830 1188.
831 Rodriguez, M., Ahmed, J., Shah, M., 2008. Action MACH a spatio-temporal
832 maximum average correlation height �lter for action recognition, in: IEEE
833 Conference on Computer Vision and Pattern Recognition, CVPR 2008.,
834 pp. 1{8.
835 Roh, M.C., Shin, H.K., Lee, S.W., 2010. View-independent human action
836 recognition with volume motion template on single stereo camera. Pattern
837 Recogn. Lett. 31, 639{647.
838 Rusu, R., Cousins, S., 2011. 3D is here: Point Cloud Library (PCL), in: 2011
839 IEEE International Conference on Robotics and Automation (ICRA), pp.
840 1{4.
Scharstein, D., Szeliski, R., 2003. High-accuracy stereo depth maps using
842 structured light, in: Proceedings. 2003 IEEE Computer Society Conference
843 on Computer Vision and Pattern Recognition (CVPR), pp. I{195{ I{202
844 vol.1.
Schuldt, C., Laptev, I., Caputo, B., 2004. Recognizing human actions: a
846 local svm approach, in: Proceedings of the 17th International Conference
847 on Pattern Recognition, ICPR 2004., pp. 32{ 36 Vol.3.
848 Schwarz, L.A., Mateus, D., Castaneda, V., Nava, N., 2010. Manifold learning
849 for tof-based human body tracking and activity recognition, in: British
850 Machine Vision Conference (BMVC).
851 Schwarz, L.A., Mateus, D., Navab, N., 2012a. Recognizing multiple human
852 activities and tracking full-body pose in unconstrained environments. Pat853
tern Recognition 45, 11{ 23.
854 Schwarz, L.A., Mkhitaryan, A., Mateus, D., Navab, N., 2011. Estimating
855 human 3D pose from Time-of-Flight images based on geodesic distances
856 and optical
ow, in: FG, pp. 700{706.
857 Schwarz, L.A., Mkhitaryan, A., Mateus, D., Navab, N., 2012b. Human skele858
ton tracking from depth data using geodesic distances and optical
ow.
859 Image and Vision Computing 30, 217{ 226.
860 Sempena, S., Maulidevi, N., Aryan, P., 2011. Human action recognition using
861 Dynamic Time Warping, in: 2011 International Conference on Electrical
862 Engineering and Informatics (ICEEI), pp. 1 {5.
Shirai, Y., Suwa, M., 1971. Recognition of polyhedrons with a range �nder,
864 in: Proceedings of the 2nd international joint conference on Arti�cial in865
telligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
866 pp. 80{87.
Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M.,
868 Moore, R., Kohli, P., Criminisi, A., Kipman, A., Blake, A., 2012. E�cient
869 human pose estimation from single depth images. Pattern Analysis and
870 Machine Intelligence, IEEE Transactions on PP, 1.
871 Siddiqui, M., Medioni, G., 2010. Human pose estimation from a single view
872 point, real-time range sensor, in: 2010 IEEE Computer Society Conference
873 on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.
874 1{8.
875 Singh, S., Velastin, S.A., Ragheb, H., 2010. Muhavi: A multicamera human
876 action video dataset for the evaluation of action recognition methods, in:
877 Proceedings of the 2010 7th IEEE International Conference on Advanced
878 Video and Signal Based Surveillance, IEEE Computer Society, Washing879
ton, DC, USA. pp. 48{55.
880 Sung, J., Ponce, C., Selman, B., Saxena, A., 2012. Unstructured Human Ac881
tivity Detection from RGBD Images. 2012 IEEE International Conference
882 on Robotics and Automation .
883 Suryanarayan, P., Subramanian, A., Mandalapu, D., 2010. Dynamic hand
884 pose recognition using depth data, in: Proceedings of the 2010 20th In885
ternational Conference on Pattern Recognition, IEEE Computer Society Washington, DC, USA. pp. 3105{3108.
887 Trucco, E., Verri, A., 1998. Introductory Techniques for 3-D Computer Vi888
sion. Prentice Hall PTR, Upper Saddle River, NJ, USA.
Vuylsteke, P., Oosterlinck, A., 1990. Range image acquisition with a single
890 binary-encoded light pattern. IEEE Transactions on Pattern Analysis and
891 Machine Intelligence 12, 148{164.
892 Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y., 2012a. Robust 3D
893 action recognition with random occupancy patterns, in: Computer Vision
894 { ECCV 2012, pp. 872{885.
895 Wang, J., Liu, Z., Wu, Y., Yuan, J., 2012b. Mining actionlet ensemble
896 for action recognition with depth cameras, in: 2012 IEEE Conference on
897 Computer Vision and Pattern Recognition (CVPR), pp. 1290 {1297.
898 Weinland, D., Ronfard, R., Boyer, E., 2010. A survey of vision-based methods
899 for action representation, segmentation and recognition. Compouter Vision
900 and Image Understanding 115, 224{241.
901 Werghi, N., Xiao, Y., 2002. Recognition of human body posture from a cloud
902 of 3D data points using wavelet transform coe�cients, in: Proceedings of
903 the Fifth IEEE International Conference on Automatic Face and Gesture
904 Recognition, pp. 70{75.
905 Wigdor, D., Wixon, D., 2011. Brave NUI World: Designing Natural User
906 Interfaces for Touch and Gesture. Morgan Kaufmann.
907 Will, P.M., Pennington, K.S., 1971. Grid coding: a preprocessing technique
908 for robot and machine vision, in: Proceedings of the 2nd international
909 joint conference on Arti�cial intelligence, Morgan Kaufmann Publishers
910 Inc., San Francisco, CA, USA. pp. 66{70.
Wolf, C., Mille, J., Lombardi, L., Celiktutan, O., Jiu, M., Baccouche, M.,
912 Dellandrea, E., Bichot, C.E., Garcia, C., Sankur, B., 2012. The LIRIS
913 Human activities dataset and the ICPR 2012 human activities recogni914
tion and localization competition. Technical Report. RR-LIRIS-2012-004,
915 LIRIS Laboratory.
916 Wu, D., Zhu, F., Shao, L., 2012. One shot learning gesture recognition from
917 RGBD images, in: 2012 IEEE Computer Society Conference on Computer
918 Vision and Pattern Recognition Workshops (CVPRW), pp. 7 {12.
919 Xia, L., Chen, C.C., Aggarwal, J., 2012. View invariant human action recog920
nition using histograms of 3d joints, in: Computer Vision and Pattern
921 Recognition Workshops (CVPRW), 2012 IEEE Computer Society Confer922
ence on, pp. 20 {27.
923 Xu, Z., Schwarte, R., Heinol, H., Buxbaum, B., Ringbeck, T., Nachrichten924
verarbeitung, I., Gmbh, S.t., Stra�e, K., 1998. Smart pixel photonic mixer
925 device ( PMD ) New system concept of a 3D-imaging camera-on-a-chip.
926 Proc Int Conf on Mechatron Machine Vision , 259{264.
927 Zhang, H., Parker, L.E., 2011. 4-dimensional local spatio-temporal features
928 for human activity recognition, in: 2011 IEEE/RSJ International Confer929
ence on Intelligent Robots and Systems (IROS), pp. 2044{2049.
Zhu, Y., Dariush, B., Fujimura, K., 2008. Controlled human pose estimation
931 from depth image streams, in: IEEE Computer Society Conference on
932 Computer Vision and Pattern Recognition Workshops, CVPRW 2008, pp.
933 1{8.
Zhu, Y., Fujimura, K., 2007. Constrained optimization for human pose esti935
mation from depth sequences, in: Proceedings of the 8th Asian conference
936 on Computer vision - Volume Part I, Springer-Verlag, Berlin, Heidelberg.
937 pp. 408{418.
938 Zhu, Y., Fujimura, K., 2010. A bayesian framework for human body pose
939 tracking from depth image sequences. Sensors 10, 5280{5293. University Staff: Request a correction | Centaur Editors: Update this record |