arXiv:2312.12423 [cs.CV]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords designing general-purpose coarse-to-fine vision-language model, coarse-to-fine instruction tuning dataset, multiple input images, vl tasks Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset