arXiv:2408.12867 [cs.CV]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords multimodal large language models, semantic alignment, multi-modal large language models, visual tokens, mmlink dataset comprises multi-modal instructions Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset