arXiv:2403.00476 [cs.CV]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords video llms really understand videos, poor temporal perception ability, video large language models, surrounding video large language, temporal aspect Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset