Feb 26, 2025 | ๐ Two papers have been accepted to CVPR 2025, including our VidComposition benchmark! |
Feb 05, 2025 | I will join Amazon as an Applied Scientist Intern this summer. |
Jan 13, 2025 | ๐จ Introducing our survey paper on GenAI for Cel-Animation ๐ arXiv | GitHub |
Dec 09, 2024 | ๐ Three papers on Video-LLMs have been accepted to AAAI 2025! |
Nov 23, 2024 | We have released VidComposition, a benchmark to evaluate MLLMsโ understanding of video compositions. ๐ Project Page | Paper | Leaderboard |
Oct 13, 2024 | ๐ MMComposition has been publicly released. Read our Paper, check out the latest ๐Leaderboard, and access the Code to evaluate your own models. |
Aug 23, 2024 | Introducing CaRDiff, a framework for video saliency prediction using MLLM CoT reasoning and diffusion model. |
Aug 05, 2024 | ๐
We won the first place in AIM 2024 Challenge on Video Saliency Prediction @ ECCV Workshop! Thanks to Gen Zhan and Li Yang! |
Jul 23, 2024 | ๐ข We've recently updated our survey: "Video Understanding with Large Language Models: A Survey"! |
Jul 15, 2024 | One paper about egocentric video understanding with LLM has been accepted to ACM MM 2024. |
Jun 18, 2024 | Introducing Differentiated Beam Decoding (DBD), a novel decoding strategy for LVLM hallucination mitigation. |
May 20, 2024 | ๐ Started my internship at ByteDance in San Jose, CA, mentored by Yiting Liao & Gen Zhan. |
Apr 18, 2024 | Introducing V2Xum-LLaMA model and Instruct-V2Xum dataset for cross-modal video summarization. |
Mar 24, 2024 | Released AVicuna, an Audio-Visual LLM empowered by pseudo-untrimmed video annotations for audio-visual event localization. |
Feb 09, 2024 | I will join ByteDance as a Research Intern this summer. |
Dec 30, 2023 | ๐ฅ๐ฅ๐ฅ Released a survey for Video Understanding with LLMs (arXiv, GitHub). |
Aug 28, 2023 | Officially joined in the Chenliang Xuโs Group at UR CS as a Ph.D. student๐. |
Jul 23, 2023 | One paper accepted to International Computer Music Conference (ICMC) 2023. |
Jun 29, 2023 | Graduated from SUSTech, obtained my bachelorโs degree and the honor of Excellent Graduate for Exceptional Performance. |
Jun 18, 2023 | Our team won the first place in LOVEU (Long-form Video Understanding) Challenge at CVPRโ23 Workshop. |
May 25, 2023 | Successfully defended my undergraduate thesis titled Language-Guided Video Cover Generation, which has been awarded the Excellent Undergraduate Thesis! |
May 04, 2023 | The technical report for Caption Anything has been released! |
Apr 12, 2023 | Our project Caption Anything has been released! Welcome to try our demo and star our GitHub repo! |
Feb 27, 2023 | Thrilled to announce Iโll be starting my Ph.D. in Computer Science at the University of Rochester from Fall 2023, working with Prof. Chenliang Xu! |
Sep 16, 2022 | One paper about multimodal Ad video editing has been accepted to Asian Conference on Computer Vision (ACCV) 2022. |
Aug 16, 2022 | I left Tencent and joined SUSTech VIP Lab as an undergraduate student researcher. |
Sep 24, 2021 | Started my part-time internship at Tencent in Shenzhen, with supervision from Dr. Wenhao Jiang and Qin Lin. I had to balance it with my university coursework. |