arXiv Analytics

Sign in

arXiv:2401.01524 [cs.CV]AbstractReferencesReviewsResources

Multimodal self-supervised learning for lesion localization

Hao Yang, Hong-Yu Zhou, Cheng Li, Weijian Huang, Jiarun Liu, Yong Liang, Shanshan Wang

Published 2024-01-03Version 1

Multimodal deep learning utilizing imaging and diagnostic reports has made impressive progress in the field of medical imaging diagnostics, demonstrating a particularly strong capability for auxiliary diagnosis in cases where sufficient annotation information is lacking. Nonetheless, localizing diseases accurately without detailed positional annotations remains a challenge. Although existing methods have attempted to utilize local information to achieve fine-grained semantic alignment, their capability in extracting the fine-grained semantics of the comprehensive contextual within reports is limited. To solve this problem, we introduce a new method that takes full sentences from textual reports as the basic units for local semantic alignment. Our approach combines chest X-ray images with their corresponding textual reports, performing contrastive learning at both global and local levels. The leading results obtained by our method on multiple datasets confirm its efficacy in the task of lesion localization.

Related articles: Most relevant | Search more
arXiv:2104.11178 [cs.CV] (Published 2021-04-22)
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
arXiv:2411.05597 [cs.CV] (Published 2024-11-08)
Predicting Stroke through Retinal Graphs and Multimodal Self-supervised Learning
arXiv:2204.11227 [cs.CV] (Published 2022-04-24)
Lesion Localization in OCT by Semi-Supervised Object Detection