Multimodal outlier optimizer for textual, numeric, and image data

Das, Krittika; Dey, Nilanjan; Misra, Bitan; Roy, Satyabrata; Sherratt, R. Simon

Download

Preview

Text (Open Access)
- Published Version
· Available under License Creative Commons Attribution.

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Das, K., Dey, N., Misra, B., Roy, S. and Sherratt, R. S. ORCID: https://orcid.org/0000-0001-7899-4445 (2025) Multimodal outlier optimizer for textual, numeric, and image data. IEEE Access, 13. 177420 -177430. ISSN 2169-3536 doi: 10.1109/ACCESS.2025.3619826

Abstract/Summary

Ensuring the quality and reliability of multimodal video data is critical for applications that rely on accurate interpretation, such as medical imaging, surveillance, remote sensing and intelligent manufacturing. However, the presence of outliers across different data types such as visual, textual, and numerical poses a major challenge. To address this, we propose the Multimodal Outlier Optimizer (MOO), a unified framework designed to detect and filter outliers from heterogeneous data modalities within video files. MOO decomposes each video into still images, text, and numeric sequences, allowing specialized algorithms to handle each modality: Nonlocal Means (NLM) for removing Gaussian noise in image frames and Local Outlier Factor (LOF) for detecting contextual outliers in textual and numerical data. These filtered components are then recombined into a cleaned, optimized video. The system is trained and evaluated using synthetically generated datasets to simulate real-world noise while ensuring scalability and control. Performance is assessed using Jaccard Similarity Score (JSS) and Structural Similarity Index (SSIM), with results demonstrating consistent improvements even under high contamination levels (up to 50%), achieving SSIM scores above 0.77 across three domains: medical imaging, remote sensing, and zoomed video data. These results highlight MOO’s potential as an effective and adaptable tool for enhancing the integrity of multimodal video data in complex, real-world environments.

Altmetric Badge

Dimensions Badge

Item Type	Article
URI	https://centaur.reading.ac.uk/id/eprint/125122
Identification Number/DOI	10.1109/ACCESS.2025.3619826
Refereed	Yes
Divisions	Life Sciences > School of Biological Sciences > Biomedical Sciences Life Sciences > School of Biological Sciences > Department of Bio-Engineering
Publisher	IEEE
Download/View statistics	View download statistics for this item

Download Statistics

Downloads

Downloads per month over past year

Related URLs

Deposit Details

CORE (COnnecting REpositories)

University Staff: Request a correction | Centaur Editors: Update this record

Date Deposited:	22 Oct 2025 13:01	Date item deposited into CentAUR
Last Modified:	26 Oct 2025 08:00	Date item last modified