TVFace: towards large-scale unsupervised face recognition in video streams
Khurshid, A., Khan, B., Shahzad, M.
It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. To link to this item DOI: 10.1007/s10044-025-01464-3 Abstract/SummaryRecent advances in deep learning have led to significant improvements in face recognition systems, but face clustering, particularly in video streams, remains a challenging problem. Current video face clustering approaches are primarily tailored for short-form content, such as movies and television shows, that features a limited number of face images and individuals. The few existing large-scale face datasets are derived from web images and do not effectively capture the complexities of the video domain. In view of these limitations, we present TVFace, the first large-scale dataset of face images extracted from long-form video content. TVFace has been sourced from public live streams of international news channels and contains a total of 2.6 million face images of 33 thousand individuals. To address the challenge of identity annotation in unstructured video streams, we design a semi-automatic annotation framework that combines unsupervised face clustering with human validation, ensuring scalable and high-quality labeling. TVFace is well suited to evaluate and advance face representation and identity classification components of face recognition systems across both image and video domains. We also demonstrate the effectiveness of TVFace in evaluating real-time person retrieval systems using a novel tree-search-based Hierarchical Retrieval Index tailored for online face clustering. In conclusion, our work centers around the preparation of TVFace, a dataset poised to reshape the landscape of face recognition in the video domain, making it a crucial resource for the research community.
Altmetric Deposit Details University Staff: Request a correction | Centaur Editors: Update this record |