Oct 13, 2024 | 🚀 MMComposition has been publicly released. Read our Paper, check out the latest 🏆Leaderboard, and access the Code to evaluate your own models. |
Aug 23, 2024 | Introducing CaRDiff, a framework for video saliency prediction using MLLM CoT reasoning and diffusion model. |
Aug 05, 2024 | 🏅 We won the first place in AIM 2024 Challenge on Video Saliency Prediction @ ECCV Workshop! Thanks to Gen Zhan and Li Yang! |
Jul 23, 2024 | 📢 We've recently updated our survey: "Video Understanding with Large Language Models: A Survey"! |
Jul 15, 2024 | One paper about egocentric video understanding with LLM has been accepted by ACM MM 2024. |
Jun 18, 2024 | Introducing Differentiated Beam Decoding (DBD), a novel decoding strategy for LVLM hallucination mitigation. |
May 20, 2024 | 🌟 Started my internship at ByteDance in San Jose, CA, supervised by Yiting Liao & Gen Zhan. |
Apr 18, 2024 | Introducing V2Xum-LLaMA model and Instruct-V2Xum dataset for cross-modal video summarization. |
Mar 24, 2024 | Released AVicuna, an Audio-Visual LLM empowered by pseudo-untrimmed video annotations for audio-visual event localization. |
Dec 30, 2023 | 🔥🔥🔥 Released a survey for Video Understanding with LLMs (arXiv, GitHub). |
Aug 28, 2023 | Officially joined in the Chenliang Xu’s Group at UR CS as a Ph.D. student🎓. |
Jul 23, 2023 | One paper accepted by International Computer Music Conference (ICMC) 2023. |
Jun 29, 2023 | Graduated from SUSTech, obtained my bachelor’s degree and the honor of Excellent Graduate for Exceptional Performance. |
Jun 18, 2023 | Our team won the first place in LOVEU (Long-form Video Understanding) Challenge at CVPR’23 Workshop. |
May 25, 2023 | Successfully defended my undergraduate thesis titled Language-Guided Video Cover Generation, which has been awarded the Excellent Undergraduate Thesis! |
May 04, 2023 | The technical report for Caption Anything has been released! |
Apr 12, 2023 | Our project Caption Anything has been released! Welcome to try our demo and star our GitHub repo! |
Feb 27, 2023 | Thrilled to announce I’ll be starting my Ph.D. in Computer Science at the University of Rochester from Fall 2023, working with Prof. Chenliang Xu! |
Sep 16, 2022 | One paper about multimodal Ad video editing has been accepted by Asian Conference on Computer Vision (ACCV) 2022. |
Aug 16, 2022 | I left Tencent and joined SUSTech VIP Lab as an undergraduate student researcher. |
Sep 24, 2021 | Started my part-time internship at Tencent in Shenzhen, with supervision from Dr. Wenhao Jiang and Qin Lin. I had to balance it with my university coursework. |