May 31, 2025 | π Introducing MMPerspective, a comprehensive benchmark for MLLMs on perspective understanding. |
May 27, 2025 | π Started my internship as an Applied Scientist Intern at Amazon in Bellevue, WA. |
May 03, 2025 | π Our Vid-LLM survey has been accepted by the IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)! π IEEE Xplore | GitHub |
Apr 09, 2025 | π· Caption Anything in Video (CAT-V) has been released π arXiv | GitHub |
Feb 26, 2025 | π Two papers have been accepted to CVPR 2025, including our VidComposition benchmark! |
Feb 05, 2025 | I will join Amazon as an Applied Scientist Intern this summer. |
Jan 13, 2025 | π¨ Introducing our survey paper on GenAI for Cel-Animation π arXiv | GitHub |
Dec 09, 2024 | π Three papers on Video-LLMs have been accepted to AAAI 2025! |
Nov 23, 2024 | We have released VidComposition, a benchmark to evaluate MLLMsβ understanding of video compositions. π Project Page | Paper | Leaderboard |
Oct 13, 2024 | π MMComposition has been publicly released. Read our Paper, check out the latest πLeaderboard, and access the Code to evaluate your own models. |
Aug 23, 2024 | Introducing CaRDiff, a framework for video saliency prediction using MLLM CoT reasoning and diffusion model. |
Aug 05, 2024 | π
We won the first place in AIM 2024 Challenge on Video Saliency Prediction @ ECCV Workshop! Thanks to Gen Zhan and Li Yang! |
Jul 23, 2024 | π’ We've recently updated our survey: "Video Understanding with Large Language Models: A Survey"! |
Jul 15, 2024 | One paper about egocentric video understanding with LLM has been accepted to ACM MM 2024. |
Jun 18, 2024 | Introducing Differentiated Beam Decoding (DBD), a novel decoding strategy for LVLM hallucination mitigation. |
May 20, 2024 | π Started my internship at ByteDance in San Jose, CA, mentored by Yiting Liao & Gen Zhan. |
Apr 18, 2024 | Introducing V2Xum-LLaMA model and Instruct-V2Xum dataset for cross-modal video summarization. |
Mar 24, 2024 | Released AVicuna, an Audio-Visual LLM empowered by pseudo-untrimmed video annotations for audio-visual event localization. |
Feb 09, 2024 | I will join ByteDance as a Research Intern this summer. |
Dec 30, 2023 | π₯π₯π₯ Released a survey for Video Understanding with LLMs (arXiv, GitHub). |
Aug 28, 2023 | Officially joined in the Chenliang Xuβs Group at UR CS as a Ph.D. studentπ. |
Jul 23, 2023 | One paper accepted to International Computer Music Conference (ICMC) 2023. |
Jun 29, 2023 | Graduated from SUSTech, obtained my bachelorβs degree and the honor of Excellent Graduate for Exceptional Performance. |
Jun 18, 2023 | Our team won the first place in LOVEU (Long-form Video Understanding) Challenge at CVPRβ23 Workshop. |
May 25, 2023 | Successfully defended my undergraduate thesis titled Language-Guided Video Cover Generation, which has been awarded the Excellent Undergraduate Thesis! |
May 04, 2023 | The technical report for Caption Anything has been released! |
Apr 12, 2023 | Our project Caption Anything has been released! Welcome to try our demo and star our GitHub repo! |
Feb 27, 2023 | Thrilled to announce Iβll be starting my Ph.D. in Computer Science at the University of Rochester from Fall 2023, working with Prof. Chenliang Xu! |
Sep 16, 2022 | One paper about multimodal Ad video editing has been accepted to Asian Conference on Computer Vision (ACCV) 2022. |
Aug 16, 2022 | I left Tencent and joined SUSTech VIP Lab as an undergraduate student researcher. |
Sep 24, 2021 | Started my part-time internship at Tencent in Shenzhen, with supervision from Dr. Wenhao Jiang and Qin Lin. I had to balance it with my university coursework. |