News

Oct 13, 2024 🚀 MMComposition has been publicly released. Read our Paper, check out the latest 🏆Leaderboard, and access the Code to evaluate your own models.
Aug 23, 2024 Introducing CaRDiff, a framework for video saliency prediction using MLLM CoT reasoning and diffusion model.
Aug 05, 2024 🏅 We won the first place in AIM 2024 Challenge on Video Saliency Prediction @ ECCV Workshop! Thanks to Gen Zhan and Li Yang!
Jul 23, 2024 📢 We've recently updated our survey: "Video Understanding with Large Language Models: A Survey"!
Jul 15, 2024 One paper about egocentric video understanding with LLM has been accepted by ACM MM 2024.
Jun 18, 2024 Introducing Differentiated Beam Decoding (DBD), a novel decoding strategy for LVLM hallucination mitigation.
May 20, 2024 🌟 Started my internship at ByteDance in San Jose, CA, supervised by Yiting Liao & Gen Zhan.
Apr 18, 2024 Introducing V2Xum-LLaMA model and Instruct-V2Xum dataset for cross-modal video summarization.
Mar 24, 2024 Released AVicuna, an Audio-Visual LLM empowered by pseudo-untrimmed video annotations for audio-visual event localization.
Dec 30, 2023 🔥🔥🔥 Released a survey for Video Understanding with LLMs (arXiv, GitHub).
Aug 28, 2023 Officially joined in the Chenliang Xu’s Group at UR CS as a Ph.D. student🎓.
Jul 23, 2023 One paper accepted by International Computer Music Conference (ICMC) 2023.
Jun 29, 2023 Graduated from SUSTech, obtained my bachelor’s degree and the honor of Excellent Graduate for Exceptional Performance.
Jun 18, 2023 Our team won the first place in LOVEU (Long-form Video Understanding) Challenge at CVPR’23 Workshop.
May 25, 2023 Successfully defended my undergraduate thesis titled Language-Guided Video Cover Generation, which has been awarded the Excellent Undergraduate Thesis!
May 04, 2023 The technical report for Caption Anything has been released!
Apr 12, 2023 Our project Caption Anything has been released! Welcome to try our demo and star our GitHub repo!
Feb 27, 2023 Thrilled to announce I’ll be starting my Ph.D. in Computer Science at the University of Rochester from Fall 2023, working with Prof. Chenliang Xu!
Sep 16, 2022 One paper about multimodal Ad video editing has been accepted by Asian Conference on Computer Vision (ACCV) 2022.
Aug 16, 2022 I left Tencent and joined SUSTech VIP Lab as an undergraduate student researcher.
Sep 24, 2021 Started my part-time internship at Tencent in Shenzhen, with supervision from Dr. Wenhao Jiang and Qin Lin. I had to balance it with my university coursework.