arXiv:2403.09072 [cs.CV]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords multimodal large language models, unified codebook, compact token representation, compress visual signals, restricts mllms ability Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset