arXiv:2312.08870 [cs.CV]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords visual tokens, reliable video narrator, equal distance, video question answering benchmarks, vista-llama Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset