arXiv:2503.08507 [cs.CV]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords multimodal large language model, better reflect real-world applications, experimental results reveal, detect multiple individuals, achieve real-world usability Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset