arXiv:2204.02547 [cs.CV]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords multi-modal features, modeling motion, linguistic features, multi-modal alignment loss, language-guided feature fusion module Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset