arXiv Analytics

Sign in

arXiv:2411.14823 [cs.CV]AbstractReferencesReviewsResources

Omni-IML: Towards Unified Image Manipulation Localization

Chenfan Qu, Yiwu Zhong, Fengjun Guo, Lianwen Jin

Published 2024-11-22Version 1

Image manipulation can lead to misinterpretation of visual content, posing significant risks to information security. Image Manipulation Localization (IML) has thus received increasing attention. However, existing IML methods rely heavily on task-specific designs, making them perform well only on one target image type but are mostly random guessing on other image types, and even joint training on multiple image types causes significant performance degradation. This hinders the deployment for real applications as it notably increases maintenance costs and the misclassification of image types leads to serious error accumulation. To this end, we propose Omni-IML, the first generalist model to unify diverse IML tasks. Specifically, Omni-IML achieves generalism by adopting the Modal Gate Encoder and the Dynamic Weight Decoder to adaptively determine the optimal encoding modality and the optimal decoder filters for each sample. We additionally propose an Anomaly Enhancement module that enhances the features of tampered regions with box supervision and helps the generalist model to extract common features across different IML tasks. We validate our approach on IML tasks across three major scenarios: natural images, document images, and face images. Without bells and whistles, our Omni-IML achieves state-of-the-art performance on all three tasks with a single unified model, providing valuable strategies and insights for real-world application and future research in generalist image forensics. Our code will be publicly available.

Related articles: Most relevant | Search more
arXiv:2407.05645 [cs.CV] (Published 2024-07-08)
OneDiff: A Generalist Model for Image Difference
arXiv:2307.13345 [cs.CV] (Published 2023-07-25)
Do humans and Convolutional Neural Networks attend to similar areas during scene classification: Effects of task and image type
arXiv:2407.10125 [cs.CV] (Published 2024-07-14)
When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset