arXiv:2003.10925 Abstract | arXiv Analytics

arXiv:2003.10925 [cs.CV]Abstract References Reviews Resources

Learning Compact Reward for Image Captioning

Published 2020-03-24Version 1

Adversarial learning has shown its advances in generating natural and diverse descriptions in image captioning. However, the learned reward of existing adversarial methods is vague and ill-defined due to the reward ambiguity problem. In this paper, we propose a refined Adversarial Inverse Reinforcement Learning (rAIRL) method to handle the reward ambiguity problem by disentangling reward for each word in a sentence, as well as achieve stable adversarial training by refining the loss function to shift the generator towards Nash equilibrium. In addition, we introduce a conditional term in the loss function to mitigate mode collapse and to increase the diversity of the generated descriptions. Our experiments on MS COCO and Flickr30K show that our method can learn compact reward for image captioning.

Comments: 13 pages, 10 figures

Categories: cs.CV, cs.CL

Keywords: image captioning, learning compact reward, adversarial inverse reinforcement learning, reward ambiguity problem, loss function

Related articles: Most relevant | Search more

arXiv:1912.08226 [cs.CV] (Published 2019-12-17)

M$^2$: Meshed-Memory Transformer for Image Captioning

Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Rita Cucchiara

arXiv:1708.05271 [cs.CV] (Published 2017-08-17)

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

arXiv:2210.10914 [cs.CV] (Published 2022-10-19)

Prophet Attention: Predicting Attention with Future Attention for Improved Image Captioning