arXiv:2311.11865 Abstract | arXiv Analytics

arXiv:2311.11865 [cs.CV]Abstract References Reviews Resources

VLM-Eval: A General Evaluation on Video Large Language Models

Shuailin Li, Yuang Zhang, Yucheng Zhao, Qiuyue Wang, Fan Jia, Yingfei Liu, Tiancai Wang

Published 2023-11-20Version 1

Despite the rapid development of video Large Language Models (LLMs), a comprehensive evaluation is still absent. In this paper, we introduce a unified evaluation that encompasses multiple video tasks, including captioning, question and answering, retrieval, and action recognition. In addition to conventional metrics, we showcase how GPT-based evaluation can match human-like performance in assessing response quality across multiple aspects. We propose a simple baseline: Video-LLaVA, which uses a single linear projection and outperforms existing video LLMs. Finally, we evaluate video LLMs beyond academic datasets, which show encouraging recognition and reasoning capabilities in driving scenarios with only hundreds of video-instruction pairs for fine-tuning. We hope our work can serve as a unified evaluation for video LLMs, and help expand more practical scenarios. The evaluation code will be available soon.

Categories: cs.CV

Keywords: video large language models, general evaluation, encompasses multiple video tasks, single linear projection, outperforms existing video llms

Related articles: Most relevant | Search more

arXiv:2411.12951 [cs.CV] (Published 2024-11-20)

On the Consistency of Video Large Language Models in Temporal Comprehension

Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang, Angela Yao

arXiv:2408.04223 [cs.CV] (Published 2024-08-08)

VideoQA in the Era of LLMs: An Empirical Study

Junbin Xiao et al.

arXiv:2403.00476 [cs.CV] (Published 2024-03-01)

TempCompass: Do Video LLMs Really Understand Videos?

Yuanxin Liu et al.

arXiv Analytics

arXiv:2311.11865 [cs.CV]Abstract References Reviews Resources

VLM-Eval: A General Evaluation on Video Large Language Models

Links

Toolbox

arXiv:2311.11865 [cs.CV]AbstractReferencesReviewsResources

VLM-Eval: A General Evaluation on Video Large Language Models

Links

Toolbox

arXiv:2311.11865 [cs.CV]Abstract References Reviews Resources