News | Yunlong (Yolo) Tang

May 31, 2025	📐 Introducing MMPerspective, a comprehensive benchmark for MLLMs on perspective understanding.
May 27, 2025	🌟 Started my internship as an Applied Scientist Intern at Amazon in Bellevue, WA.
May 03, 2025	🎉 Our Vid-LLM survey has been accepted by the IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)! 👉 IEEE Xplore \| GitHub
Apr 09, 2025	📷 Caption Anything in Video (CAT-V) has been released 👉 arXiv \| GitHub
Feb 26, 2025	🎉 Two papers have been accepted to CVPR 2025, including our VidComposition benchmark!
Feb 05, 2025	I will join Amazon as an Applied Scientist Intern this summer.
Jan 13, 2025	🎨 Introducing our survey paper on GenAI for Cel-Animation 👉 arXiv \| GitHub
Dec 09, 2024	🎉 Three papers on Video-LLMs have been accepted to AAAI 2025!
Nov 23, 2024	We have released VidComposition, a benchmark to evaluate MLLMs’ understanding of video compositions. 👉 Project Page \| Paper \| Leaderboard
Oct 13, 2024	🚀 MMComposition has been publicly released. Read our Paper, check out the latest 🏆Leaderboard, and access the Code to evaluate your own models.
Aug 23, 2024	Introducing CaRDiff, a framework for video saliency prediction using MLLM CoT reasoning and diffusion model.
Aug 05, 2024	🏅 We won the first place in AIM 2024 Challenge on Video Saliency Prediction @ ECCV Workshop! Thanks to Gen Zhan and Li Yang!
Jul 23, 2024	📢 We've recently updated our survey: "Video Understanding with Large Language Models: A Survey"!
Jul 15, 2024	One paper about egocentric video understanding with LLM has been accepted to ACM MM 2024.
Jun 18, 2024	Introducing Differentiated Beam Decoding (DBD), a novel decoding strategy for LVLM hallucination mitigation.
May 20, 2024	🌟 Started my internship at ByteDance in San Jose, CA, mentored by Yiting Liao & Gen Zhan.
Apr 18, 2024	Introducing V2Xum-LLaMA model and Instruct-V2Xum dataset for cross-modal video summarization.
Mar 24, 2024	Released AVicuna, an Audio-Visual LLM empowered by pseudo-untrimmed video annotations for audio-visual event localization.
Feb 09, 2024	I will join ByteDance as a Research Intern this summer.
Dec 30, 2023	🔥🔥🔥 Released a survey for Video Understanding with LLMs (arXiv, GitHub).
Aug 28, 2023	Officially joined in the Chenliang Xu’s Group at UR CS as a Ph.D. student🎓.
Jul 23, 2023	One paper accepted to International Computer Music Conference (ICMC) 2023.
Jun 29, 2023	Graduated from SUSTech, obtained my bachelor’s degree and the honor of Excellent Graduate for Exceptional Performance.
Jun 18, 2023	Our team won the first place in LOVEU (Long-form Video Understanding) Challenge at CVPR’23 Workshop.
May 25, 2023	Successfully defended my undergraduate thesis titled Language-Guided Video Cover Generation, which has been awarded the Excellent Undergraduate Thesis!
May 04, 2023	The technical report for Caption Anything has been released!
Apr 12, 2023	Our project Caption Anything has been released! Welcome to try our demo and star our GitHub repo!
Feb 27, 2023	Thrilled to announce I’ll be starting my Ph.D. in Computer Science at the University of Rochester from Fall 2023, working with Prof. Chenliang Xu!
Sep 16, 2022	One paper about multimodal Ad video editing has been accepted to Asian Conference on Computer Vision (ACCV) 2022.
Aug 16, 2022	I left Tencent and joined SUSTech VIP Lab as an undergraduate student researcher.
Sep 24, 2021	Started my part-time internship at Tencent in Shenzhen, with supervision from Dr. Wenhao Jiang and Qin Lin. I had to balance it with my university coursework.