arXiv:2408.16986 [cs.CV]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords visual tokens, dynamic input scaling, versatile scene understanding, multimodal large language model, adaptvision Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset