arXiv Analytics

Sign in

arXiv:2003.10925 [cs.CV]AbstractReferencesReviewsResources

Learning Compact Reward for Image Captioning

Nannan Li, Zhenzhong Chen

Published 2020-03-24Version 1

Adversarial learning has shown its advances in generating natural and diverse descriptions in image captioning. However, the learned reward of existing adversarial methods is vague and ill-defined due to the reward ambiguity problem. In this paper, we propose a refined Adversarial Inverse Reinforcement Learning (rAIRL) method to handle the reward ambiguity problem by disentangling reward for each word in a sentence, as well as achieve stable adversarial training by refining the loss function to shift the generator towards Nash equilibrium. In addition, we introduce a conditional term in the loss function to mitigate mode collapse and to increase the diversity of the generated descriptions. Our experiments on MS COCO and Flickr30K show that our method can learn compact reward for image captioning.

Related articles: Most relevant | Search more
arXiv:1912.08226 [cs.CV] (Published 2019-12-17)
M$^2$: Meshed-Memory Transformer for Image Captioning
arXiv:1708.05271 [cs.CV] (Published 2017-08-17)
Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects
arXiv:2210.10914 [cs.CV] (Published 2022-10-19)
Prophet Attention: Predicting Attention with Future Attention for Improved Image Captioning