Created in April 18, 2024
2024
Introducing V2Xum-LLaMA model and Instruct-V2Xum dataset for cross-modal video summarization.