arXiv:2011.07191 [cs.LG]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords multimodal representation learning, early fusion, visual inputs, early multimodal fusion, initial c-lstm layer results Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset