arXiv:2501.09532 [cs.CV]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords visual tokens, cross-modality attention mixture mechanism, visual-language alignment, approach achieves state-of-the-art training-free vlm, achieves state-of-the-art training-free vlm acceleration Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset