📢 We've recently updated our survey: "Video Understanding with Large Language Models: A Survey"!

📢 We’ve recently updated our survey: “Video Understanding with Large Language Models: A Survey”!

✨ This comprehensive survey covers video understanding techniques powered by large language models (Vid-LLMs), training strategies, relevant tasks, datasets, benchmarks, and evaluation methods, and discusses the applications of Vid-LLMs across various domains.

🚀 What’s New in This Update:
✅ Updated to include around 100 additional Vid-LLMs and 15 new benchmarks as of June 2024.
✅ Introduced a novel taxonomy for Vid-LLMs based on video representation and LLM functionality.
✅ Added a Preliminary chapter, reclassifying video understanding tasks from the perspectives of granularity and language involvement, and added the LLM Background section.
✅ Added a new Training Strategies subsection, removing adapters as a factor for model classification.

Thanks to all authors for their contributions and support ❤️

This major update will be followed by multiple minor updates. We welcome your reading and feedback.

🔗 arXiv: https://arxiv.org/pdf/2312.17432

🔗 GitHub: https://github.com/yunlong10/Awesome-LLMs-for-Video-Understanding