📢 We've recently updated our survey: "Video Understanding with Large Language Models: A Survey"!
📢 We’ve recently updated our survey: “Video Understanding with Large Language Models: A Survey”!
✨ This comprehensive survey covers video understanding techniques powered by large language models (Vid-LLMs), training strategies, relevant tasks, datasets, benchmarks, and evaluation methods, and discusses the applications of Vid-LLMs across various domains.
🚀 What’s New in This Update:
✅ Updated to include around 100 additional Vid-LLMs and 15 new benchmarks as of June 2024.
✅ Introduced a novel taxonomy for Vid-LLMs based on video representation and LLM functionality.
✅ Added a Preliminary chapter, reclassifying video understanding tasks from the perspectives of granularity and language involvement, and added the LLM Background section.
✅ Added a new Training Strategies subsection, removing adapters as a factor for model classification.
Thanks to all authors for their contributions and support ❤️
This major update will be followed by multiple minor updates. We welcome your reading and feedback.
🔗 arXiv: https://arxiv.org/pdf/2312.17432
🔗 GitHub: https://github.com/yunlong10/Awesome-LLMs-for-Video-Understanding