A survey of human motion analysis using depth imagery

Chen, Lulu; Wei, Hong; Ferryman, James

Download

Full text not archived in this repository.

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Chen, L., Wei, H. ORCID: https://orcid.org/0000-0002-9664-5748 and Ferryman, J. (2013) A survey of human motion analysis using depth imagery. Pattern Recognition Letters, 34 (15). pp. 1995-2006. ISSN 0167-8655 doi: 10.1016/j.patrec.2013.02.006

Abstract/Summary

Analysis of human behaviour through visual information has been a highly active research topic in the computer vision community. This was previously achieved via images from a conventional camera, but recently depth sensors have made a new type of data available. This survey starts by explaining the advantages of depth imagery, then describes the new sensors that are available to obtain it. In particular, the Microsoft Kinect has made high-resolution real-time depth cheaply available. The main published research on the use of depth imagery for analysing human activity is reviewed. Much of the existing work focuses on body part detection and pose estimation. A growing research area addresses the recognition of human actions. The publicly available datasets that include depth imagery are listed, as are the software libraries that can acquire it from a sensor. This survey concludes by summarising the current state of work on this topic, and pointing out promising future research directions.

Altmetric Badge

Dimensions Badge

Item Type	Article
URI	https://centaur.reading.ac.uk/id/eprint/31463
Identification Number/DOI	10.1016/j.patrec.2013.02.006
Refereed	Yes
Divisions	Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
Uncontrolled Keywords	Range data, depth sensor, survey, human pose estimation, human action recognition, 3D body model 2010 MSC: 68-02, 68T45
Publisher	Elsevier
Download/View statistics	View download statistics for this item

Deposit Details

References

Aggarwal, J., Ryoo, M., 2011. Human activity analysis: A review. ACM 690 Comput. Surv. 43, 16:1{16:43. 691 Ahad, M.A.R., Tan, J.K., Kim, H., Ishikawa, S., 2012. Motion history image: 692 its variants and applications. Mach. Vision Appl. 23, 255{281. 693 Allen, J., 1983. Maintaining knowledge about temporal intervals. Commun. 694 ACM 26, 832{843. 695 Anguelov, D., Taskarf, B., Chatalbashev, V., Koller, D., Gupta, D., Heitz, 696 G., Ng, A., 2005. Discriminative learning of Markov random �elds for 697 segmentation of 3D scan data, in: IEEE Computer Society Conference on 698 Computer Vision and Pattern Recognition, CVPR 2005, pp. 169{176. 699 Batlle, J., Mouaddib, E., Salvi, J., 1998. Recent progress in coded structured 700 light as a technique to solve the correspondence problem: a survey. Pattern 701 Recognition 31, 963{ 982. 702 Benko, H., Wilson, A.D., 2009. Depthtouch: Using depth-sensing camera to 703 enable freehand interactions on and above the interactive surface. IEEE 704 Workshop on Tabletops and Interactive Surfaces . 705 Van den Bergh, M., Van Gool, L., 2011. Combining RGB and ToF cameras 706 for real-time 3D hand gesture interaction, in: 2011 IEEE Workshop on 707 Applications of Computer Vision (WACV), pp. 66 {72. 708 Besl, P., McKay, H., 1992. A method for registration of 3-D shapes. IEEE 709 Transactions on Pattern Analysis and Machine Intelligence 14, 239{256. Bobick, A., Davis, J., 2001. The recognition of human movement using 711 temporal templates. IEEE Transactions on Pattern Analysis and Machine 712 Inelligence 23, 257{ 267. 713 Breuer, P., Eckes, C., Mu�ller, S., 2007. Hand gesture recognition with a 714 novel ir time-of- ight range camera: a pilot study, in: Proceedings of 715 the 3rd international conference on Computer vision/computer graphics 716 collaboration techniques, Springer-Verlag, Berlin, Heidelberg. pp. 247{260. 717 Charles, J., Everingham, M., 2011. Learning shape models for monocular 718 human pose estimation from the Microsoft Xbox Kinect, in: 2011 IEEE 719 International Conference on Computer Vision Workshops (ICCVW), pp. 720 1202{1208. 721 Chen, C.S., Hung, Y.P., Chiang, C.C., Wu, J.L., 1997. Range data acqui722 sition using color structured lighting and stereo vision. Image and Vision 723 Computing 15, 445{ 456. 724 Chen, L., Wei, H., Ferryman, J., 2011. Recognition of everyday domestic 725 activities using a depth sensor, in: BMVC 2011 student workshop, pp. 726 27{37. 727 Demirdjian, D., Ko, T., Darrell, T., 2003. Constraining human body track728 ing, in: 2003 Proceedings. Ninth IEEE International Conference on Com729 puter Vision, pp. 1071{1078 vol.2. 730 Dolla�r, P., Rabaud, V., Cottrell, G., Belongie, S., 2005. Behavior recognition 731 via sparse spatio-temporal features. Visual Surveillance and Performance 732 Evaluation of Tracking and Surveillance (VS-PETS) 0, 65{72. Ganapathi, V., Plagemann, C., Koller, D., Thrun, S., 2010. Real time motion 734 capture using a single time-of- ight camera, in: 2010 IEEE Conference on 735 Computer Vision and Pattern Recognition (CVPR), pp. 755{762. 736 Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A., 2011. 737 E�cient regression of general-activity human poses from depth images, in: 738 Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 415 739 {422. 740 Grammalidis, N., Goussis, G., Troufakos, G., Strintzis, M., 2001. 3-D hu741 man body tracking from depth images using analysis by synthesis, in: Pro742 ceedings. 2001 International Conference on Image Processing, pp. 185{188 743 vol.2. 744 Grest, D., Woetzel, J., Koch, R., 2005. Nonlinear body pose estimation from 745 depth images, in: Proceedings of the 27th DAGM conference on Pattern 746 Recognition, Springer-Verlag, Berlin, Heidelberg. pp. 285{292. 747 Guomundsson, S., Larsen, R., Aanaes, H., Pardas, M., Casas, J., 2008. TOF 748 imaging in smart room environments towards improved people tracking, 749 in: 2008. IEEE Computer Society Conference on Computer Vision and 750 Pattern Recognition Workshops (CVPRW), pp. 1{6. 751 Hartley, R.I., Zisserman, A., 2004. Multiple View Geometry in Computer 752 Vision. Cambridge University Press, ISBN: 0521540518. second edition. 753 Holt, B., Ong, E.J., Cooper, H., Bowden, R., 2011. Putting the pieces to754 gether: Connected poselets for human pose estimation, in: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Work756 shops), pp. 1196{1201. 757 Holte, M.B., Moeslund, T.B., 2007. Gesture recognition using a range cam758 era. Technical Report , 1{5. 759 Hu, G., Stockman, G., 1989. 3-D surface solution using structured light 760 and constraint propagation. IEEE Transactions on Pattern Analysis and 761 Machine Intelligence 11, 390{402. 762 Iddan, G.J., Yahav, G., 2001. 3D imaging in the studio. IN: SPIE 4298, 763 48{55. 764 Jansen, B., Temmermans, F., Deklerck, R., 2007. 3D human pose recognition 765 for home monitoring of elderly, in: Engineering in Medicine and Biology 766 Society, EMBS 2007. 29th Annual International Conference of the IEEE, 767 pp. 4049{4051. 768 Ji, X., Liu, H., 2010. Advances in view-invariant human motion analysis: A 769 review. Systems, Man, and Cybernetics, Part C: Applications and Reviews, 770 IEEE Transactions on 40, 13 {24. 771 Johansson, G., 1973. Visual perception of biological motion and a model for 772 its analysis. Attention, Perception, and Psychophysics 14, 201{211. 773 Kalogerakis, E., Hertzmann, A., Singh, K., 2010. Learning 3D mesh segmen774 tation and labeling, in: ACM SIGGRAPH 2010, ACM, New York, NY, 775 USA. pp. 102:1{102:12. International Conference on Computer Vision Workshops (ICCV Work756 shops), pp. 1196{1201. 757 Holte, M.B., Moeslund, T.B., 2007. Gesture recognition using a range cam758 era. Technical Report , 1{5. 759 Hu, G., Stockman, G., 1989. 3-D surface solution using structured light 760 and constraint propagation. IEEE Transactions on Pattern Analysis and 761 Machine Intelligence 11, 390{402. 762 Iddan, G.J., Yahav, G., 2001. 3D imaging in the studio. IN: SPIE 4298, 763 48{55. 764 Jansen, B., Temmermans, F., Deklerck, R., 2007. 3D human pose recognition 765 for home monitoring of elderly, in: Engineering in Medicine and Biology 766 Society, EMBS 2007. 29th Annual International Conference of the IEEE, 767 pp. 4049{4051. 768 Ji, X., Liu, H., 2010. Advances in view-invariant human motion analysis: A 769 review. Systems, Man, and Cybernetics, Part C: Applications and Reviews, 770 IEEE Transactions on 40, 13 {24. 771 Johansson, G., 1973. Visual perception of biological motion and a model for 772 its analysis. Attention, Perception, and Psychophysics 14, 201{211. 773 Kalogerakis, E., Hertzmann, A., Singh, K., 2010. Learning 3D mesh segmen774 tation and labeling, in: ACM SIGGRAPH 2010, ACM, New York, NY, 775 USA. pp. 102:1{102:12. Knoop, S., Vacek, S., Dillmann, R., 2009. Fusion of 2D and 3D sensor data 777 for articulated body tracking. Robotics Autonomous Systems 57, 321{329. 778 Kolb, A., Barth, E., Koch, R., 2008. Tof-sensors: New dimensions for realism 779 and interactivity. IEEE Computer Society Conference On Computer Vision 780 and Pattern Recognition Workshops (CVPRW) , 1518{1523. 781 Kollorz, E., Penne, J., Hornegger, J., Barke, A., 2008. Gesture recognition 782 with a time-of- ight camera. International Journal of Intelligent Systems 783 Technologies and Applications 5, 334. 784 Kurakin, A., Zhang, Z., Liu, Z., 2012. A real time system for dynamic hand 785 gesture recognition with a depth sensor, in: Signal Processing Conference 786 (EUSIPCO), 2012 Proceedings of the 20th European, pp. 1975 {1979. 787 Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B., 2008. Learning realistic 788 human actions from movies, in: IEEE Conference on Computer Vision and 789 Pattern Recognition, CVPR 2008., pp. 1{8. 790 Li, W., Zhang, Z., Liu, Z., 2010. Action recognition based on a bag of 3D 791 points, in: 2010 IEEE Computer Society Conference on Computer Vision 792 and Pattern Recognition Workshops (CVPRW), pp. 9{14. 793 Lui, Y.M., 2012. A least squares regression framework on manifolds and 794 its application to gesture recognition, in: 2012 IEEE Computer Soci795 ety Conference on Computer Vision and Pattern Recognition Workshops 796 (CVPRW), pp. 13 {18. Malgireddy, M., Inwogu, I., Govindaraju, V., 2012. A temporal bayesian 798 model for classifying, detecting and localizing activities in video sequences, in: 2012 IEEE Computer Society Conference on Computer Vision and 800 Pattern Recognition Workshops (CVPRW), pp. 43 {48. 801 Marszalek, M., Laptev, I., Schmid, C., 2009. Actions in context, in: IEEE 802 Conference on Computer Vision and Pattern Recognition, CVPR 2009., 803 pp. 2929{2936. 804 Moeslund, T., Hilton, A., Kru�ger, V., 2006. A survey of advances in vision805 based human motion capture and analysis. Computer Vision and Image 806 Understanding 104, 90{126. 807 Ni, B., Wang, G., Moulin, P., 2011. RGBD-HuDaAct: A color-depth video 808 database for human daily activity recognition, in: 2011 IEEE International 809 Conference on Computer Vision Workshops (ICCV Workshops), pp. 1147{ 810 1153. 811 Oggier, T., Bu�ttgen, B., Lustenberger, F., Becker, G., Ru�egg, B., Hodac, 812 A., 2005. Swissranger SR3000 and �rst experiences based on mniaturized 813 3D-ToF cameras., in: In Proc. of the First Range Imaging Research Day 814 at ETH Zurich. 815 Pellegrini, S., Iocchi, L., 2008. Human posture tracking and classi�cation 816 through stereo vision and 3D model matching. J. Image Video Process. 817 2008, 7:1{7:12. Phillips, P., Flynn, P., Scruggs, T., Bowyer, K., Chang, J., Ho�man, K., 819 Marques, J., Min, J., Worek, W., 2005. Overview of the face recogni820 tion grand challenge, in: Computer Vision and Pattern Recognition, 2005. 821 CVPR 2005. IEEE Computer Society Conference on, pp. 947 { 954 vol. 1. Plagemann, C., Ganapathi, V., Koller, D., Thrun, S., 2010. Real-time iden823 ti�cation and localization of body parts from depth images, in: IEEE 824 International Conference on Robotics and Automation (ICRA). 825 Poppe, R., 2010. A survey on vision-based human action recognition. Image 826 Vision Comput. 28, 976{990. 827 Reyes, M., Dominguez, G., Escalera, S., 2011. Featureweighting in dynamic 828 timewarping for gesture recognition in depth data, in: 2011 IEEE Inter829 national Conference on Computer Vision Workshops (ICCVW), pp. 1182{ 830 1188. 831 Rodriguez, M., Ahmed, J., Shah, M., 2008. Action MACH a spatio-temporal 832 maximum average correlation height �lter for action recognition, in: IEEE 833 Conference on Computer Vision and Pattern Recognition, CVPR 2008., 834 pp. 1{8. 835 Roh, M.C., Shin, H.K., Lee, S.W., 2010. View-independent human action 836 recognition with volume motion template on single stereo camera. Pattern 837 Recogn. Lett. 31, 639{647. 838 Rusu, R., Cousins, S., 2011. 3D is here: Point Cloud Library (PCL), in: 2011 839 IEEE International Conference on Robotics and Automation (ICRA), pp. 840 1{4. Scharstein, D., Szeliski, R., 2003. High-accuracy stereo depth maps using 842 structured light, in: Proceedings. 2003 IEEE Computer Society Conference 843 on Computer Vision and Pattern Recognition (CVPR), pp. I{195{ I{202 844 vol.1. Schuldt, C., Laptev, I., Caputo, B., 2004. Recognizing human actions: a 846 local svm approach, in: Proceedings of the 17th International Conference 847 on Pattern Recognition, ICPR 2004., pp. 32{ 36 Vol.3. 848 Schwarz, L.A., Mateus, D., Castaneda, V., Nava, N., 2010. Manifold learning 849 for tof-based human body tracking and activity recognition, in: British 850 Machine Vision Conference (BMVC). 851 Schwarz, L.A., Mateus, D., Navab, N., 2012a. Recognizing multiple human 852 activities and tracking full-body pose in unconstrained environments. Pat853 tern Recognition 45, 11{ 23. 854 Schwarz, L.A., Mkhitaryan, A., Mateus, D., Navab, N., 2011. Estimating 855 human 3D pose from Time-of-Flight images based on geodesic distances 856 and optical ow, in: FG, pp. 700{706. 857 Schwarz, L.A., Mkhitaryan, A., Mateus, D., Navab, N., 2012b. Human skele858 ton tracking from depth data using geodesic distances and optical ow. 859 Image and Vision Computing 30, 217{ 226. 860 Sempena, S., Maulidevi, N., Aryan, P., 2011. Human action recognition using 861 Dynamic Time Warping, in: 2011 International Conference on Electrical 862 Engineering and Informatics (ICEEI), pp. 1 {5. Shirai, Y., Suwa, M., 1971. Recognition of polyhedrons with a range �nder, 864 in: Proceedings of the 2nd international joint conference on Arti�cial in865 telligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 866 pp. 80{87. Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., 868 Moore, R., Kohli, P., Criminisi, A., Kipman, A., Blake, A., 2012. E�cient 869 human pose estimation from single depth images. Pattern Analysis and 870 Machine Intelligence, IEEE Transactions on PP, 1. 871 Siddiqui, M., Medioni, G., 2010. Human pose estimation from a single view 872 point, real-time range sensor, in: 2010 IEEE Computer Society Conference 873 on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 874 1{8. 875 Singh, S., Velastin, S.A., Ragheb, H., 2010. Muhavi: A multicamera human 876 action video dataset for the evaluation of action recognition methods, in: 877 Proceedings of the 2010 7th IEEE International Conference on Advanced 878 Video and Signal Based Surveillance, IEEE Computer Society, Washing879 ton, DC, USA. pp. 48{55. 880 Sung, J., Ponce, C., Selman, B., Saxena, A., 2012. Unstructured Human Ac881 tivity Detection from RGBD Images. 2012 IEEE International Conference 882 on Robotics and Automation . 883 Suryanarayan, P., Subramanian, A., Mandalapu, D., 2010. Dynamic hand 884 pose recognition using depth data, in: Proceedings of the 2010 20th In885 ternational Conference on Pattern Recognition, IEEE Computer Society Washington, DC, USA. pp. 3105{3108. 887 Trucco, E., Verri, A., 1998. Introductory Techniques for 3-D Computer Vi888 sion. Prentice Hall PTR, Upper Saddle River, NJ, USA. Vuylsteke, P., Oosterlinck, A., 1990. Range image acquisition with a single 890 binary-encoded light pattern. IEEE Transactions on Pattern Analysis and 891 Machine Intelligence 12, 148{164. 892 Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y., 2012a. Robust 3D 893 action recognition with random occupancy patterns, in: Computer Vision 894 { ECCV 2012, pp. 872{885. 895 Wang, J., Liu, Z., Wu, Y., Yuan, J., 2012b. Mining actionlet ensemble 896 for action recognition with depth cameras, in: 2012 IEEE Conference on 897 Computer Vision and Pattern Recognition (CVPR), pp. 1290 {1297. 898 Weinland, D., Ronfard, R., Boyer, E., 2010. A survey of vision-based methods 899 for action representation, segmentation and recognition. Compouter Vision 900 and Image Understanding 115, 224{241. 901 Werghi, N., Xiao, Y., 2002. Recognition of human body posture from a cloud 902 of 3D data points using wavelet transform coe�cients, in: Proceedings of 903 the Fifth IEEE International Conference on Automatic Face and Gesture 904 Recognition, pp. 70{75. 905 Wigdor, D., Wixon, D., 2011. Brave NUI World: Designing Natural User 906 Interfaces for Touch and Gesture. Morgan Kaufmann. 907 Will, P.M., Pennington, K.S., 1971. Grid coding: a preprocessing technique 908 for robot and machine vision, in: Proceedings of the 2nd international 909 joint conference on Arti�cial intelligence, Morgan Kaufmann Publishers 910 Inc., San Francisco, CA, USA. pp. 66{70. Wolf, C., Mille, J., Lombardi, L., Celiktutan, O., Jiu, M., Baccouche, M., 912 Dellandrea, E., Bichot, C.E., Garcia, C., Sankur, B., 2012. The LIRIS 913 Human activities dataset and the ICPR 2012 human activities recogni914 tion and localization competition. Technical Report. RR-LIRIS-2012-004, 915 LIRIS Laboratory. 916 Wu, D., Zhu, F., Shao, L., 2012. One shot learning gesture recognition from 917 RGBD images, in: 2012 IEEE Computer Society Conference on Computer 918 Vision and Pattern Recognition Workshops (CVPRW), pp. 7 {12. 919 Xia, L., Chen, C.C., Aggarwal, J., 2012. View invariant human action recog920 nition using histograms of 3d joints, in: Computer Vision and Pattern 921 Recognition Workshops (CVPRW), 2012 IEEE Computer Society Confer922 ence on, pp. 20 {27. 923 Xu, Z., Schwarte, R., Heinol, H., Buxbaum, B., Ringbeck, T., Nachrichten924 verarbeitung, I., Gmbh, S.t., Stra�e, K., 1998. Smart pixel photonic mixer 925 device ( PMD ) New system concept of a 3D-imaging camera-on-a-chip. 926 Proc Int Conf on Mechatron Machine Vision , 259{264. 927 Zhang, H., Parker, L.E., 2011. 4-dimensional local spatio-temporal features 928 for human activity recognition, in: 2011 IEEE/RSJ International Confer929 ence on Intelligent Robots and Systems (IROS), pp. 2044{2049. Zhu, Y., Dariush, B., Fujimura, K., 2008. Controlled human pose estimation 931 from depth image streams, in: IEEE Computer Society Conference on 932 Computer Vision and Pattern Recognition Workshops, CVPRW 2008, pp. 933 1{8. Zhu, Y., Fujimura, K., 2007. Constrained optimization for human pose esti935 mation from depth sequences, in: Proceedings of the 8th Asian conference 936 on Computer vision - Volume Part I, Springer-Verlag, Berlin, Heidelberg. 937 pp. 408{418. 938 Zhu, Y., Fujimura, K., 2010. A bayesian framework for human body pose 939 tracking from depth image sequences. Sensors 10, 5280{5293.

CORE (COnnecting REpositories)

University Staff: Request a correction | Centaur Editors: Update this record

Date Deposited:	11 Mar 2013 11:32	Date item deposited into CentAUR
Last Modified:	01 Mar 2026 05:31	Date item last modified