News

Feb 26, 2025 ๐ŸŽ‰ Two papers have been accepted to CVPR 2025, including our VidComposition benchmark!
Feb 05, 2025 I will join Amazon as an Applied Scientist Intern this summer.
Jan 13, 2025 ๐ŸŽจ Introducing our survey paper on GenAI for Cel-Animation ๐Ÿ‘‰ arXiv | GitHub
Dec 09, 2024 ๐ŸŽ‰ Three papers on Video-LLMs have been accepted to AAAI 2025!
Nov 23, 2024 We have released VidComposition, a benchmark to evaluate MLLMsโ€™ understanding of video compositions. ๐Ÿ‘‰ Project Page | Paper | Leaderboard
Oct 13, 2024 ๐Ÿš€ MMComposition has been publicly released. Read our Paper, check out the latest ๐Ÿ†Leaderboard, and access the Code to evaluate your own models.
Aug 23, 2024 Introducing CaRDiff, a framework for video saliency prediction using MLLM CoT reasoning and diffusion model.
Aug 05, 2024 ๐Ÿ… We won the first place in AIM 2024 Challenge on Video Saliency Prediction @ ECCV Workshop! Thanks to Gen Zhan and Li Yang!
Jul 23, 2024 ๐Ÿ“ข We've recently updated our survey: "Video Understanding with Large Language Models: A Survey"!
Jul 15, 2024 One paper about egocentric video understanding with LLM has been accepted to ACM MM 2024.
Jun 18, 2024 Introducing Differentiated Beam Decoding (DBD), a novel decoding strategy for LVLM hallucination mitigation.
May 20, 2024 ๐ŸŒŸ Started my internship at ByteDance in San Jose, CA, mentored by Yiting Liao & Gen Zhan.
Apr 18, 2024 Introducing V2Xum-LLaMA model and Instruct-V2Xum dataset for cross-modal video summarization.
Mar 24, 2024 Released AVicuna, an Audio-Visual LLM empowered by pseudo-untrimmed video annotations for audio-visual event localization.
Feb 09, 2024 I will join ByteDance as a Research Intern this summer.
Dec 30, 2023 ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ Released a survey for Video Understanding with LLMs (arXiv, GitHub).
Aug 28, 2023 Officially joined in the Chenliang Xuโ€™s Group at UR CS as a Ph.D. student๐ŸŽ“.
Jul 23, 2023 One paper accepted to International Computer Music Conference (ICMC) 2023.
Jun 29, 2023 Graduated from SUSTech, obtained my bachelorโ€™s degree and the honor of Excellent Graduate for Exceptional Performance.
Jun 18, 2023 Our team won the first place in LOVEU (Long-form Video Understanding) Challenge at CVPRโ€™23 Workshop.
May 25, 2023 Successfully defended my undergraduate thesis titled Language-Guided Video Cover Generation, which has been awarded the Excellent Undergraduate Thesis!
May 04, 2023 The technical report for Caption Anything has been released!
Apr 12, 2023 Our project Caption Anything has been released! Welcome to try our demo and star our GitHub repo!
Feb 27, 2023 Thrilled to announce Iโ€™ll be starting my Ph.D. in Computer Science at the University of Rochester from Fall 2023, working with Prof. Chenliang Xu!
Sep 16, 2022 One paper about multimodal Ad video editing has been accepted to Asian Conference on Computer Vision (ACCV) 2022.
Aug 16, 2022 I left Tencent and joined SUSTech VIP Lab as an undergraduate student researcher.
Sep 24, 2021 Started my part-time internship at Tencent in Shenzhen, with supervision from Dr. Wenhao Jiang and Qin Lin. I had to balance it with my university coursework.