arXiv:2411.14823 Abstract | arXiv Analytics

arXiv:2411.14823 [cs.CV]Abstract References Reviews Resources

Omni-IML: Towards Unified Image Manipulation Localization

Chenfan Qu, Yiwu Zhong, Fengjun Guo, Lianwen Jin

Published 2024-11-22Version 1

Image manipulation can lead to misinterpretation of visual content, posing significant risks to information security. Image Manipulation Localization (IML) has thus received increasing attention. However, existing IML methods rely heavily on task-specific designs, making them perform well only on one target image type but are mostly random guessing on other image types, and even joint training on multiple image types causes significant performance degradation. This hinders the deployment for real applications as it notably increases maintenance costs and the misclassification of image types leads to serious error accumulation. To this end, we propose Omni-IML, the first generalist model to unify diverse IML tasks. Specifically, Omni-IML achieves generalism by adopting the Modal Gate Encoder and the Dynamic Weight Decoder to adaptively determine the optimal encoding modality and the optimal decoder filters for each sample. We additionally propose an Anomaly Enhancement module that enhances the features of tampered regions with box supervision and helps the generalist model to extract common features across different IML tasks. We validate our approach on IML tasks across three major scenarios: natural images, document images, and face images. Without bells and whistles, our Omni-IML achieves state-of-the-art performance on all three tasks with a single unified model, providing valuable strategies and insights for real-world application and future research in generalist image forensics. Our code will be publicly available.

Categories: cs.CV, cs.CR, cs.LG

Keywords: unified image manipulation localization, image type, iml methods rely heavily, omni-iml achieves state-of-the-art performance, generalist model

Related articles: Most relevant | Search more

arXiv:2407.05645 [cs.CV] (Published 2024-07-08)

OneDiff: A Generalist Model for Image Difference

Erdong Hu, Longteng Guo, Tongtian Yue, Zijia Zhao, Shuning Xue, Jing Liu

arXiv:2307.13345 [cs.CV] (Published 2023-07-25)

Do humans and Convolutional Neural Networks attend to similar areas during scene classification: Effects of task and image type

Romy Müller, Marcel Duerschmidt, Julian Ullrich, Carsten Knoll, Sascha Weber, Steffen Seitz

arXiv:2407.10125 [cs.CV] (Published 2024-07-14)

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

Yi Zhang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu