Accessibility navigation


Multimodal outlier optimizer for textual, numeric, and image data

Das, K., Dey, N., Misra, B., Roy, S. and Sherratt, R. S. ORCID: https://orcid.org/0000-0001-7899-4445 (2025) Multimodal outlier optimizer for textual, numeric, and image data. IEEE Access, 13. 177420 -177430. ISSN 2169-3536

[thumbnail of Open Access]
Preview
Text (Open Access) - Published Version
· Available under License Creative Commons Attribution.
· Please see our End User Agreement before downloading.

2MB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.1109/ACCESS.2025.3619826

Abstract/Summary

Ensuring the quality and reliability of multimodal video data is critical for applications that rely on accurate interpretation, such as medical imaging, surveillance, remote sensing and intelligent manufacturing. However, the presence of outliers across different data types such as visual, textual, and numerical poses a major challenge. To address this, we propose the Multimodal Outlier Optimizer (MOO), a unified framework designed to detect and filter outliers from heterogeneous data modalities within video files. MOO decomposes each video into still images, text, and numeric sequences, allowing specialized algorithms to handle each modality: Nonlocal Means (NLM) for removing Gaussian noise in image frames and Local Outlier Factor (LOF) for detecting contextual outliers in textual and numerical data. These filtered components are then recombined into a cleaned, optimized video. The system is trained and evaluated using synthetically generated datasets to simulate real-world noise while ensuring scalability and control. Performance is assessed using Jaccard Similarity Score (JSS) and Structural Similarity Index (SSIM), with results demonstrating consistent improvements even under high contamination levels (up to 50%), achieving SSIM scores above 0.77 across three domains: medical imaging, remote sensing, and zoomed video data. These results highlight MOO’s potential as an effective and adaptable tool for enhancing the integrity of multimodal video data in complex, real-world environments.

Item Type:Article
Refereed:Yes
Divisions:Life Sciences > School of Biological Sciences > Biomedical Sciences
Life Sciences > School of Biological Sciences > Department of Bio-Engineering
ID Code:125122
Publisher:IEEE

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation